Automatic Mixed Precision for Deep Learning

Deep Neural Network training has traditionally relied on IEEE single-precision format, however with mixed precision, you can train with half precision while maintaining the network accuracy achieved with single precision. This technique of using both single- and half-precision representations is referred to as mixed precision technique.

Benefits of Mixed precision training

Speeds up math-intensive operations, such as linear and convolution layers, by using Tensor Cores.

Speeds up memory-limited operations by accessing half the bytes compared to single-precision.

Reduces memory requirements for training models, enabling larger models or larger minibatches.

Nuance Research advances and applies conversational AI technologies to power solutions that redefine how humans and computers interact. The rate of our advances reflects the speed at which we train and assess deep learning models. With Automatic Mixed Precision, weâ€™ve realized a 50% speedup in TensorFlow-based ASR model training without loss of accuracy via a minimal code change. Weâ€™re eager to achieve a similar impact in our other deep learning language processing applications.
Wenxuan Teng, Senior Research Manager, Nuance Communications

Enabling mixed precision involves two steps: porting the model to use the half-precision data type where appropriate, and using loss scaling to preserve small gradient values. Deep learning researchers and engineers can easily get started enabling this feature on Ampere, Volta and Turing GPUs.

On Ampere GPUs, automatic mixed precision uses FP16 to deliver a performance boost of 3X versus TF32, the new format which is already ~6x faster than FP32. On Volta and Turing GPUs, automatic mixed precision delivers up to 3X higher performance vs FP32 with just a few lines of code. The best training performance on NVIDIA GPUs is always available on the NVIDIA deep learning performance page.

Using Automatic Mixed Precision for Major Deep Learning Frameworks

TensorFlow

Automatic Mixed Precision is available both in native TensorFlow and inside the TensorFlow container on NVIDIA NGC container registry. To enable AMP in NGC TensorFlow 19.07 or upstream TensorFlow 1.14 or later, wrap your tf.train or tf.keras.optimizers Optimizer as follows:


opt = tf.train.experimental.enable_mixed_precision_graph_rewrite(opt)

This change applies automatic loss scaling to your model and enables automatic casting to half precision.

â€œAutomated mixed precision powered by NVIDIA Tensor Core GPUs on Alibaba allows us to instantly speedup AI models nearly 3X. Our researchers appreciated the ease of turning on this feature to instantly accelerate our AI.â€

â€” Wei Linï¼ŒSenior Director at Alibaba Computing Platform, Alibaba

Try with TensorFlow

Developer Blog

Documentation

PyTorch

Automatic Mixed Precision feature is available in the Apex repository on GitHub. To enable, add these two lines of code into your existing training script:

scaler = GradScaler()

with autocast(): ????output = model(input) ????loss = loss_fn(output, target)

scaler.scale(loss).backward()

scaler.step(optimizer)

scaler.update()

Try with PyTorch

Developer Blog

Documentation

MXNet

Automatic Mixed Precision feature is available both in native MXNet (1.5 or later) and inside the MXNet container (19.04 or later) on NVIDIA NGC container registry. To enable the feature, add the following lines of code to your existing training script:

amp.init()
amp.init_trainer(trainer)
with amp.scale_loss(loss, trainer) as scaled_loss:
???autograd.backward(scaled_loss)

Try with MXNet

Documentation

PaddlePaddle

Automatic Mixed Precision feature is available in PaddlePaddle on GitHub. To enable, add these two lines of code into your existing training script:

sgd = SGDOptimizer()
mp_sgd = fluid.contrib.mixed_precision.decorator.decorate(sgd)
mp_sgd.minimize(loss)

Try with PaddlePaddle

Additional Resources

Webinar: Mixed-Precision Training of Neural Networks
AIâ€™s Latest Precision Format TensorFloat-32 Delivers Dramatic Out-of-the-box Performance While Preserving Accuracy
Learn more: Tensor Cores for developers
NVIDIA Ampere Architecture In-Depth
Whatâ€™s the Difference Between Single-, Double-, Multi- and Mixed-Precision Computing?
Tensor Core Optimized Examples
Developer Blog: Automatic Mixed Precision for NVIDIA Tensor Core Architecture in TensorFlow

äººäººè¶…ç¢°97caoporenå›½äº§