Efficiency is paramount in industrial manufacturing, where even minor gains can have significant financial implications. According to the American Society of Quality, ��Many organizations will have true quality-related costs as high as 15-20% of sales revenue, some going as high as 40% of total operations.�� These staggering statistics reveal a stark reality: defects in industrial applications not��
]]>Deep learning models require hundreds of gigabytes of data to generalize well on unseen samples. Data augmentation helps by increasing the variability of examples in datasets. The traditional approach to data augmentation dates to statistical learning when the choice of augmentation relied on the domain knowledge, skill, and intuition of the engineers that set up the model training.
]]>Join the NVIDIA Triton and NVIDIA TensorRT community to stay current on the latest product updates, bug fixes, content, best practices, and more. We��re excited to announce the NVIDIA Quantization-Aware Training (QAT) Toolkit for TensorFlow 2 with the goal of accelerating the quantized networks with NVIDIA TensorRT on NVIDIA GPUs. This toolkit provides you with an easy-to-use API to quantize��
]]>Deep learning recommender systems often use large embedding tables. It can be difficult to fit them in GPU memory. This post shows you how to use a combination of model parallel and data parallel training paradigms to solve this memory issue to train large deep learning recommender systems more quickly. I share the steps that my team took to efficiently train a 113 billion-parameter��
]]>Applying new technology to studying ancient history, researchers are looking to expand their understanding of dinosaurs with a new AI algorithm. The study, published in Frontiers in Earth Science, uses high-resolution Computed Tomography (CT) imaging combined with deep learning models to scan and evaluate dinosaur fossils. The research is a step toward creating a new tool that would vastly change��
]]>Gran Turismo (GT) Sport competitors are facing a new, AI-supercharged contender thanks to the latest collaborative effort from Sony AI, Sony Interactive Entertainment (SIE), and Polyphony Digital Inc., the developers behind GT Sport. The autonomous AI racing agent, known as Gran Turismo Sophy (GT Sophy), recently beat the world��s best drivers in GT Sport. Published in Nature��
]]>The NVIDIA NGC catalog is a hub for GPU-optimized deep learning, machine learning, and HPC applications. With highly performant software containers, pretrained models, industry-specific SDKs, and Jupyter Notebooks the content helps simplify and accelerate end-to-end workflows. New features, software, and updates to help you streamline your workflow and build your solutions faster on NGC��
]]>A team of researchers from Princeton and the University of Washington created a new camera that captures stunning images and measures in at only a half-millimeter��the size of a coarse grain of salt. The new study, published in Nature Communications, outlines the use of optical metasurfaces with machine learning to produce high-quality color imagery, with a wide field of view.
]]>Recommender systems are the economic engine of the Internet. It is hard to imagine any other type of applications with more direct impact in our daily digital lives: Trillions of items to be recommended to billions of people. Recommender systems filter products and services among an overwhelming number of options, easing the paradox of choice that most users face. As the amount of data��
]]>In part 1 of this series, we introduced new API functions, and , that enable memory allocation and deallocation to be stream-ordered operations. In this post, we highlight the benefits of this new capability by sharing some big data benchmark results and provide a code migration guide for modifying your existing applications. We also cover advanced topics to take advantage of stream-ordered��
]]>Most CUDA developers are familiar with the and API functions to allocate GPU accessible memory. However, there has long been an obstacle with these API functions: they aren��t stream ordered. In this post, we introduce new API functions, and , that enable memory allocation and deallocation to be stream-ordered operations. In part 2 of this series, we highlight the benefits of this new��
]]>New research out of the University of California, San Francisco has given a paralyzed man the ability to communicate by translating his brain signals into computer generated writing. The study, published in The New England Journal of Medicine, marks a significant milestone toward restoring communication for people who have lost the ability to speak. ��To our knowledge, this is the first��
]]>Recommender systems drive engagement on many of the most popular online platforms. As data volume grows exponentially, data scientists increasingly turn from traditional machine learning methods to highly expressive, deep learning models to improve recommendation quality. Often, the recommendations are framed as modeling the completion of a user-item matrix, in which the user-item entry is the��
]]>Recently, NVIDIA CEO Jensen Huang announced updates to the open beta of NVIDIA Merlin, an end-to-end framework that democratizes the development of large-scale deep learning recommenders. With NVIDIA Merlin, data scientists, machine learning engineers, and researchers can accelerate their entire workflow pipeline from ingesting and training to deploying GPU-accelerated recommenders (Figure 1).
]]>Software profiling is key for achieving the best performance on a system and that��s true for the data science and machine learning applications as well. In the era of GPU-accelerated deep learning, when profiling deep neural networks, it is important to understand CPU, GPU, and even memory bottlenecks, which could cause slowdowns in training or inference. In this post��
]]>Imagine that you have just finished implementing an awesome, interactive, deep learning pipeline on your NVIDIA-accelerated data science workstation using OpenCV for capturing your webcam stream and rendering the output. A colleague of yours mentions that exploiting the novel TF32 compute mode of the Ampere microarchitecture third-generation Tensor Cores might significantly accelerate your��
]]>AI is going mainstream and is quickly becoming pervasive in every industry��from autonomous vehicles to drug discovery. However, developing and deploying AI applications is a challenging endeavor. The process requires building a scalable infrastructure by combining hardware, software, and intricate workflows, which can be time-consuming as well as error-prone. To accelerate the end-to-end AI��
]]>The NVIDIA A100, based on the NVIDIA Ampere GPU architecture, offers a suite of exciting new features: third-generation Tensor Cores, Multi-Instance GPU (MIG) and third-generation NVLink. Ampere Tensor Cores introduce a novel math mode dedicated for AI training: the TensorFloat-32 (TF32). TF32 is designed to accelerate the processing of FP32 data types, commonly used in DL workloads.
]]>WSL2 is available on Windows 11 outside the Windows Insider Preview. For more information about what is supported, see the CUDA on WSL User Guide. In response to popular demand, Microsoft announced a new feature of the Windows Subsystem for Linux 2 (WSL 2)��GPU acceleration��at the Build conference in May 2020. This feature opens the gate for many compute applications, professional tools��
]]>Medical image segmentation is a hot topic in the deep learning community. Proof of that is the number of challenges, competitions, and research projects being conducted in this area, which only rises year over year. Among all the different approaches to this problem, U-Net has become the backbone of many of the top-performing solutions for both 2D and 3D segmentation tasks. This is due to its��
]]>The process of building an AI-powered solution from start to finish can be daunting. First, datasets must be curated and pre-processed. Next, models need to be trained and tested for inference performance, and then finally deployed into a usable, customer-facing application. At each step along the way, developers are constantly face time-consuming challenges, such as building efficient��
]]>Deep Learning in medical imaging has shown great potential for disease detection, localization, and classification within radiology. Deep Learning holds the potential to create solutions that can detect conditions that might have been overlooked and can improve the efficiency and effectiveness of the radiology team. However, for this to happen data scientists and radiologists need to collaborate��
]]>Whether to employ mixed precision to train your TensorFlow models is no longer a tough decision. NVIDIA��s Automatic Mixed Precision (AMP) feature for TensorFlow, recently announced at the 2019 GTC, features automatic mixed precision training by making all the required model and optimizer adjustments internally within TensorFlow with minimal programmer intervention.
]]>The pace of AI adoption across diverse industries depends on maximizing data scientists�� productivity. NVIDIA releases optimized NGC containers every month with improved performance for deep learning frameworks and libraries, helping scientists maximize their potential. NVIDIA continuously invests in the full data science stack, including GPU architecture, systems, and software stacks.
]]>Mixed-Precision combines different numerical precisions in a computational method. Using precision lower than FP32 reduces memory usage, allowing deployment of larger neural networks. Data transfers take less time, and compute performance increases, especially on NVIDIA GPUs with Tensor Core support for that precision. Mixed-precision training of DNNs achieves two main objectives: This video��
]]>NVIDIA has released TensorRT 4 at CVPR 2018. This new version of TensorRT, NVIDIA��s powerful inference optimizer and runtime engine provides: Additional features include the ability to execute custom neural network layers using FP16 precision and support for the Xavier SoC through NVIDIA DRIVE AI platforms. TensorRT 4 speeds up deep learning inference applications such as neural machine��
]]>Update, May 9, 2018: TensorFlow v1.7 and above integrates with TensorRT 3.0.4. NVIDIA is working on supporting the integration for a wider set of configurations and versions. We��ll publish updates when these become available. Meanwhile, if you��re using , simply download TensorRT files for Ubuntu 14.04 not16.04, no matter what version of Ubuntu you��re running.
]]>NVIDIA TensorRT is a high-performance deep learning inference optimizer and runtime that delivers low latency, high-throughput inference for deep learning applications. NVIDIA released TensorRT last year with the goal of accelerating deep learning inference for production deployment. In this post we��ll introduce TensorRT 3, which improves performance versus previous versions and includes new��
]]>In part 1 of this series I introduced Generative Adversarial Networks (GANs) and showed how to generate images of handwritten digits using a GAN. In this post I will do something much more exciting: use Generative Adversarial Networks to generate images of celebrity faces. I am going to use CelebA [1], a dataset of 200,000 aligned and cropped 178 x 218-pixel RGB images of celebrities.
]]>You heard it from the Deep Learning guru: Generative Adversarial Networks [2] are a very hot topic in Machine Learning. In this post I will explore various ways of using a GAN to create previously unseen images. I provide source code in Tensorflow and a modified version of DIGITS that you are free to use if you wish to try it out yourself. Figure 1 gives a preview of what you will learn to do in��
]]>