Automatic Mixed Precision (AMP) – NVIDIA Technical Blog

Automatic Mixed Precision (AMP) – NVIDIA Technical Blog News and tutorials for developers, data scientists, and IT admins 2025-07-03T22:20:47Z http://www.open-lab.net/blog/feed/ Kyle Kranen <![CDATA[Time Series Forecasting with the NVIDIA Time Series Prediction Platform and Triton Inference Server]]> http://www.open-lab.net/blog/?p=44168 2022-08-21T23:53:25Z 2022-02-15T16:00:00Z

In this post, we detail the recently released NVIDIA Time Series Prediction Platform (TSPP), a tool designed to compare easily and experiment with arbitrary...]]>

In this post, we detail the recently released NVIDIA Time Series Prediction Platform (TSPP), a tool designed to compare easily and experiment with arbitrary...

Time Series Forcasting_Featured Image

In this post, we detail the recently released NVIDIA Time Series Prediction Platform (TSPP), a tool designed to compare easily and experiment with arbitrary combinations of forecasting models, time-series datasets, and other configurations. The TSPP also provides functionality to explore the hyperparameter search space, run accelerated model training using distributed training and Automatic Mixed��

]]> 3 Vinh Nguyen <![CDATA[Accelerating TensorFlow on NVIDIA A100 GPUs]]> http://www.open-lab.net/blog/?p=18957 2023-06-12T21:15:05Z 2020-07-24T22:22:06Z

The NVIDIA A100, based on the NVIDIA Ampere GPU architecture, offers a suite of exciting new features: third-generation Tensor Cores, Multi-Instance GPU (MIG)...]]>

The NVIDIA A100, based on the NVIDIA Ampere GPU architecture, offers a suite of exciting new features: third-generation Tensor Cores, Multi-Instance GPU (MIG)...

tf_logo_social

The NVIDIA A100, based on the NVIDIA Ampere GPU architecture, offers a suite of exciting new features: third-generation Tensor Cores, Multi-Instance GPU (MIG) and third-generation NVLink. Ampere Tensor Cores introduce a novel math mode dedicated for AI training: the TensorFloat-32 (TF32). TF32 is designed to accelerate the processing of FP32 data types, commonly used in DL workloads.

]]> 0 Jocelyn Huang <![CDATA[Develop Smaller Speech Recognition Models with the NVIDIA NeMo Framework]]> http://www.open-lab.net/blog/?p=16063 2023-03-14T23:16:05Z 2019-12-10T16:00:44Z

As computers and other personal devices have become increasingly prevalent, interest in conversational AI has grown due to its multitude of potential...]]>

As computers and other personal devices have become increasingly prevalent, interest in conversational AI has grown due to its multitude of potential...

QuartzNet architecture

As computers and other personal devices have become increasingly prevalent, interest in conversational AI has grown due to its multitude of potential applications in a variety of situations. Each conversational AI framework is comprised of several more basic modules such as automatic speech recognition (ASR), and the models for these need to be lightweight in order to be effectively deployed on��

]]> 11 Raghav Mani <![CDATA[Neural Modules for Fast Development of Speech and Language Models]]> http://www.open-lab.net/blog/?p=15664 2022-08-21T23:39:37Z 2019-09-14T14:59:20Z

[stextbox id="info"]This post has been updated with Announcing NVIDIA NeMo: Fast Development of Speech and Language Models. The new version has information...]]>

[stextbox id="info"]This post has been updated with Announcing NVIDIA NeMo: Fast Development of Speech and Language Models. The new version has information...

Neural Modules Diagram1 (002)

This post has been updated with Announcing NVIDIA NeMo: Fast Development of Speech and Language Models. The new version has information about pretrained models in NGC and fine-tuning models on custom dataset sections, upgrades the NeMo diagram with the text-to-speech collection, and replaces the AN4 dataset in the example with the LibriSpeech dataset. As a researcher building state-of-the��

]]> 0 Prethvi Kashinkunti <![CDATA[Creating an Object Detection Pipeline for GPUs]]> http://www.open-lab.net/blog/?p=14734 2022-08-21T23:39:30Z 2019-06-19T17:00:13Z

Earlier this year in March, we showed retinanet-examples, an open source example?of how to accelerate the training and deployment of an object detection...]]>

Earlier this year in March, we showed retinanet-examples, an open source example?of how to accelerate the training and deployment of an object detection...

NVIDIA Deel Learning SDK category image

Earlier this year in March, we showed retinanet-examples, an open source example of how to accelerate the training and deployment of an object detection pipeline for GPUs. We presented the project at NVIDIA��s GPU Technology Conference in San Jose. This post discusses the motivation for this work, a high-level description of the architecture, and a brief look under-the-hood at the optimizations we��

]]> 0 Amulya Vishwanath <![CDATA[Automatic Mixed Precision for NVIDIA Tensor Core Architecture in TensorFlow]]> http://www.open-lab.net/blog/?p=14054 2022-08-21T23:39:23Z 2019-03-18T22:19:17Z

Whether to employ mixed precision to train your TensorFlow models is no longer a tough decision.?NVIDIA��s?Automatic Mixed Precision (AMP) feature for...]]>

Whether to employ mixed precision to train your TensorFlow models is no longer a tough decision.?NVIDIA��s?Automatic Mixed Precision (AMP) feature for... Jetson Xavier Tensor Core Matrix

Jetson Xavier Tensor Core Matrix

Whether to employ mixed precision to train your TensorFlow models is no longer a tough decision. NVIDIA��s Automatic Mixed Precision (AMP) feature for TensorFlow, recently announced at the 2019 GTC, features automatic mixed precision training by making all the required model and optimizer adjustments internally within TensorFlow with minimal programmer intervention.

]]> 5 Carl Case <![CDATA[NVIDIA Apex: Tools for Easy Mixed-Precision Training in PyTorch]]> http://www.open-lab.net/blog/?p=12951 2022-08-21T23:39:14Z 2018-12-03T16:00:57Z

Most deep learning frameworks, including PyTorch, train using 32-bit floating point (FP32) arithmetic by default. However, using FP32 for all operations is not...]]>

Most deep learning frameworks, including PyTorch, train using 32-bit floating point (FP32) arithmetic by default. However, using FP32 for all operations is not...

tensor_cube_white-1280

Most deep learning frameworks, including PyTorch, train using 32-bit floating point (FP32) arithmetic by default. However, using FP32 for all operations is not essential to achieve full accuracy for many state-of-the-art deep neural networks (DNNs). In 2017, NVIDIA researchers developed a methodology for mixed-precision training in which a few operations are executed in FP32 while the majority��

]]> 0 ��˳��97caoporen��