AI is transforming computing, and inference is how the capabilities of AI are deployed in the world��s applications. Intelligent chatbots, image and video synthesis from simple text prompts, personalized content recommendations, and medical imaging are just a few examples of AI-powered applications. Inference workloads are both computationally demanding and diverse, requiring that platforms be��
]]>Join the free NVIDIA Developer Program and enroll in a course from the NVIDIA Deep Learning Institute.
]]>The training stage of deep learning (DL) models consists of learning numerous dense floating-point weight matrices, which results in a massive amount of floating-point computations during inference. Research has shown that many of those computations can be skipped by forcing some weights to be zero, with little impact on the final accuracy. In parallel to that, previous posts have shown that��
]]>Discover how to build a robust MLOps practice for continuous delivery and automated deployment of AI workloads at scale.
]]>Join us for these GTC 2022 sessions to learn about optimizing PyTorch models, accelerating graph neural networks, improving GPU performance, and more.
]]>Join our deep learning sessions at GTC 2022 to learn about real-world use cases, new tools, and best practices for training and inference.
]]>As the size and complexity of large language models (LLMs) continue to grow, NVIDIA is today announcing updates to the NeMo framework that provide training speed-ups of up to 30%. These updates�Cwhich include two trailblazing techniques and a hyperparameter tool to optimize and scale training of LLMs on any number of GPUs�Coffer new capabilities to train and deploy models using the NVIDIA AI��
]]>GPU-accelerated workloads are thriving across all industries, from the use of AI for better customer engagement and data analytics for business forecasting to advanced visualization for quicker product innovation. One of the biggest challenges with GPU-accelerated infrastructure is choosing the right hardware systems. While the line of business cares about performance and the ability to use a��
]]>AI is transforming every industry, enabling powerful new applications and use cases that simply weren��t possible with traditional software. As AI continues to proliferate, and with the size and complexity of AI models on the rise, significant advances in AI compute performance are required to keep up. That��s where the NVIDIA platform comes in. With a full-stack approach spanning chips��
]]>Enterprises across industries are leveraging natural language process (NLP) solutions��from chatbots to audio transcription��to improve customer engagement, increase employee productivity, and drive revenue growth. NLP is one of the most challenging tasks for AI because it must understand the underlying context of text without explicit rules in human language. Building an AI-powered solution��
]]>This post is the third in a series on Autonomous Driving at Scale, developed with Tata Consultancy Services (TCS). The previous posts provided a general overview of deep learning inference for object detection and covered the object detection inference process and object detection metrics. In this post, we conclude with a brief look at the optimization techniques and deployment of an end-to-end��
]]>With the third-generation Tensor Core technology, NVIDIA recently unveiled A100 Tensor Core GPU that delivers unprecedented acceleration at every scale for AI, data analytics, and high-performance computing. Along with the great performance increase over prior generation GPUs comes another groundbreaking innovation, Multi-Instance GPU (MIG). With MIG, each A100 GPU can be partitioned up to seven��
]]>This post is the first in a series on Autonomous Driving at Scale, developed with Tata Consultancy Services (TCS). In this post, we provide a general overview of the deep learning inference for object detection. The next posts cover the object detection inference process and object detection metrics and optimization techniques and deployment of an end-to-end inference pipeline.
]]>Deep neural network (DNN) development for self-driving cars is a demanding workload. In this post, we validate DGX multi-node, multi-GPU, distributed training running on RedHat OpenShift in the DXC Robotic Drive environment. We used OpenShift 3.11, also a part of the Robotic Drive containerized compute platform, to orchestrate and execute the deep learning (DL) workloads.
]]>Let��s imagine a situation. You buy a brand-new, cutting-edge, Volta-powered DGX-2 server. You��ve done your math right, expecting a 2x performance increase in ResNet50 training over the DGX-1 you had before. You plug it into your rack cabinet and run the training. That��s when an unpleasant surprise pops up. Even though your math is correct, the speedup you��re getting lower than expected. Why?
]]>Earlier this year in March, we showed retinanet-examples, an open source example of how to accelerate the training and deployment of an object detection pipeline for GPUs. We presented the project at NVIDIA��s GPU Technology Conference in San Jose. This post discusses the motivation for this work, a high-level description of the architecture, and a brief look under-the-hood at the optimizations we��
]]>Labellio is the world��s easiest deep learning web service for computer vision. It aims to provide a deep learning environment for image data where non-experts in deep learning can experiment with their ideas for image classification applications. Watch our video embedded here to see how easy it is. The challenges in deep learning today are not just in configuring hyperparameters or��
]]>As CUDA Educator at NVIDIA, I work to give access to massively parallel programming education & training to everyone, whether or not they have access to GPUs in their own machines. This is why, in partnership with qwikLABS, NVIDIA has made the hands-on content we use to train thousands of developers at the Supercomputing Conference and the GPU Technology Conference online and accessible from��
]]>