In this post, we detail the recently released NVIDIA Time Series Prediction Platform (TSPP), a tool designed to compare easily and experiment with arbitrary combinations of forecasting models, time-series datasets, and other configurations. The TSPP also provides functionality to explore the hyperparameter search space, run accelerated model training using distributed training and Automatic Mixed��
]]>The NVIDIA A100, based on the NVIDIA Ampere GPU architecture, offers a suite of exciting new features: third-generation Tensor Cores, Multi-Instance GPU (MIG) and third-generation NVLink. Ampere Tensor Cores introduce a novel math mode dedicated for AI training: the TensorFloat-32 (TF32). TF32 is designed to accelerate the processing of FP32 data types, commonly used in DL workloads.
]]>As computers and other personal devices have become increasingly prevalent, interest in conversational AI has grown due to its multitude of potential applications in a variety of situations. Each conversational AI framework is comprised of several more basic modules such as automatic speech recognition (ASR), and the models for these need to be lightweight in order to be effectively deployed on��
]]>This post has been updated with Announcing NVIDIA NeMo: Fast Development of Speech and Language Models. The new version has information about pretrained models in NGC and fine-tuning models on custom dataset sections, upgrades the NeMo diagram with the text-to-speech collection, and replaces the AN4 dataset in the example with the LibriSpeech dataset. As a researcher building state-of-the��
]]>Earlier this year in March, we showed retinanet-examples, an open source example of how to accelerate the training and deployment of an object detection pipeline for GPUs. We presented the project at NVIDIA��s GPU Technology Conference in San Jose. This post discusses the motivation for this work, a high-level description of the architecture, and a brief look under-the-hood at the optimizations we��
]]>Whether to employ mixed precision to train your TensorFlow models is no longer a tough decision. NVIDIA��s Automatic Mixed Precision (AMP) feature for TensorFlow, recently announced at the 2019 GTC, features automatic mixed precision training by making all the required model and optimizer adjustments internally within TensorFlow with minimal programmer intervention.
]]>Most deep learning frameworks, including PyTorch, train using 32-bit floating point (FP32) arithmetic by default. However, using FP32 for all operations is not essential to achieve full accuracy for many state-of-the-art deep neural networks (DNNs). In 2017, NVIDIA researchers developed a methodology for mixed-precision training in which a few operations are executed in FP32 while the majority��
]]>