Tensor Cores – NVIDIA Technical Blog

Tensor Cores – NVIDIA Technical Blog News and tutorials for developers, data scientists, and IT admins 2025-07-03T22:20:47Z http://www.open-lab.net/blog/feed/ Elias Wolfberg <![CDATA[AI Aims to Bring Order to the Law]]> http://www.open-lab.net/blog/?p=102161 2025-06-25T17:33:21Z 2025-06-16T15:00:00Z

A team of Stanford University researchers has developed an LLM system to cut through bureaucratic red tape. The LLM��dubbed the System for Statutory Research,...]]>

A team of Stanford University researchers has developed an LLM system to cut through bureaucratic red tape. The LLM��dubbed the System for Statutory Research,...

elias feat law

A team of Stanford University researchers has developed an LLM system to cut through bureaucratic red tape. The LLM��dubbed the System for Statutory Research, or STARA��can help policymakers quickly and cheaply parse voluminous collections of rules to identify laws that are redundant, outdated, or overly onerous. Ultimately, it can make governments more efficient, the researchers say.

]]> 0 Dan Ernst <![CDATA[How Modern Supercomputers Powered by NVIDIA Are Pushing the Limits of Speed �� and Science]]> http://www.open-lab.net/blog/?p=101731 2025-06-12T18:48:39Z 2025-06-10T09:00:00Z

Modern high-performance computing (HPC) is enabling more than just quick calculations �� it��s powering AI systems that are unlocking scientific...]]>

Modern high-performance computing (HPC) is enabling more than just quick calculations �� it��s powering AI systems that are unlocking scientific...

hpc-tech-blog-isc-top-500-1920x1080

Modern high-performance computing (HPC) is enabling more than just quick calculations �� it��s powering AI systems that are unlocking scientific breakthroughs. HPC has gone through many iterations, each sparked by a creative repurposing of technologies. For example, early supercomputers used off-the-shelf components. Researchers later built powerful clusters from personal computers and even��

]]> 0 Elias Wolfberg <![CDATA[AI Helps Locate Dangerous Fishing Nets Lost at Sea]]> http://www.open-lab.net/blog/?p=101622 2025-06-26T22:25:34Z 2025-06-08T10:00:00Z

Conservationists have launched a new AI tool that can sift through petabytes of underwater imaging from anywhere in the world to identify signs of abandoned or...]]>

Conservationists have launched a new AI tool that can sift through petabytes of underwater imaging from anywhere in the world to identify signs of abandoned or...

ghost nets featured

Conservationists have launched a new AI tool that can sift through petabytes of underwater imaging from anywhere in the world to identify signs of abandoned or lost fishing nets��so-called ghost nets. Each year, around 2% of the world��s fishing gear��including roughly 80,000 square kilometers of fishing nets��is lost in the oceans. Those nets threaten marine wildlife like seals, turtles��

]]> 0 Karin Sevegnani <![CDATA[Floating-Point 8: An Introduction to Efficient, Lower-Precision AI Training]]> http://www.open-lab.net/blog/?p=101197 2025-06-12T18:50:43Z 2025-06-04T16:27:30Z

With the growth of large language models (LLMs), deep learning is advancing both model architecture design and computational efficiency. Mixed precision...]]>

With the growth of large language models (LLMs), deep learning is advancing both model architecture design and computational efficiency. Mixed precision... A decorative image.

A decorative image.

With the growth of large language models (LLMs), deep learning is advancing both model architecture design and computational efficiency. Mixed precision training, which strategically employs lower precision formats like brain floating point 16 (BF16) for computationally intensive operations while retaining the stability of 32-bit floating-point (FP32) where needed, has been a key strategy for��

]]> 0 Sama Bali <![CDATA[GPU Memory Essentials for AI Performance]]> http://www.open-lab.net/blog/?p=94979 2025-01-23T19:54:24Z 2025-01-15T16:00:00Z

Generative AI has revolutionized how people bring ideas to life, and agentic AI represents the next leap forward in this technological evolution. By leveraging...]]>

Generative AI has revolutionized how people bring ideas to life, and agentic AI represents the next leap forward in this technological evolution. By leveraging...

ai-model-images

Generative AI has revolutionized how people bring ideas to life, and agentic AI represents the next leap forward in this technological evolution. By leveraging sophisticated, autonomous reasoning and iterative planning, AI agents can tackle complex, multistep problems with remarkable efficiency. As AI continues to revolutionize industries, the demand for running AI models locally has surged.

]]> 1 Ashraf Eassa <![CDATA[NVIDIA H200 Tensor Core GPUs and NVIDIA TensorRT-LLM Set MLPerf LLM Inference Records]]> http://www.open-lab.net/blog/?p=80197 2024-11-14T15:53:12Z 2024-03-27T15:29:05Z

Generative AI is unlocking new computing applications that greatly augment human capability, enabled by continued model innovation. Generative AI...]]>

Generative AI is unlocking new computing applications that greatly augment human capability, enabled by continued model innovation. Generative AI... An image of an NVIDIA H200 Tensor Core GPU.

An image of an NVIDIA H200 Tensor Core GPU.

Generative AI is unlocking new computing applications that greatly augment human capability, enabled by continued model innovation. Generative AI models��including large language models (LLMs)��are used for crafting marketing copy, writing computer code, rendering detailed images, composing music, generating videos, and more. The amount of compute required by the latest models is immense and��

]]> 0 Ashraf Eassa <![CDATA[Setting New Records in MLPerf Inference v3.0 with Full-Stack Optimizations for AI]]> http://www.open-lab.net/blog/?p=62958 2023-07-05T19:23:50Z 2023-04-05T19:10:55Z

The most exciting computing applications currently rely on training and running inference on complex AI models, often in demanding, real-time deployment...]]>

The most exciting computing applications currently rely on training and running inference on complex AI models, often in demanding, real-time deployment...

hpc-mlperf-inference-v3.0

The most exciting computing applications currently rely on training and running inference on complex AI models, often in demanding, real-time deployment scenarios. High-performance, accelerated AI platforms are needed to meet the demands of these applications and deliver the best user experiences. New AI models are constantly being invented to enable new capabilities��

]]> 0 Michael Andersch <![CDATA[NVIDIA Hopper Architecture In-Depth]]> http://www.open-lab.net/blog/?p=45555 2023-10-25T23:51:26Z 2022-03-22T18:00:00Z

Today during the 2022 NVIDIA GTC Keynote address, NVIDIA CEO Jensen Huang introduced the new NVIDIA H100 Tensor Core GPU based on the new NVIDIA Hopper GPU...]]>

Today during the 2022 NVIDIA GTC Keynote address, NVIDIA CEO Jensen Huang introduced the new NVIDIA H100 Tensor Core GPU based on the new NVIDIA Hopper GPU...

GTC2022_SXM5_01_v001_DL

Today during the 2022 NVIDIA GTC Keynote address, NVIDIA CEO Jensen Huang introduced the new NVIDIA H100 Tensor Core GPU based on the new NVIDIA Hopper GPU architecture. This post gives you a look inside the new H100 GPU and describes important new features of NVIDIA Hopper architecture GPUs. The NVIDIA H100 Tensor Core GPU is our ninth-generation data center GPU designed to deliver an��

]]> 2 Ike Nnoli <![CDATA[Explore and Test Experimental Models for DLSS Research]]> http://www.open-lab.net/blog/?p=37548 2023-06-12T21:33:38Z 2021-09-24T17:00:00Z

Today, NVIDIA is enabling developers to explore and evaluate experimental AI models for Deep Learning Super Sampling (DLSS). Developers can download...]]>

Today, NVIDIA is enabling developers to explore and evaluate experimental AI models for Deep Learning Super Sampling (DLSS). Developers can download...

DLSS

Today, NVIDIA is enabling developers to explore and evaluate experimental AI models for Deep Learning Super Sampling (DLSS). Developers can download experimental Dynamic-link libraries (DLLs), test how the latest DLSS research enhances their games, and provide feedback for future improvements. NVIDIA DLSS is a deep learning neural network that boosts frame rates and generates beautiful��

]]> 0 Jeff Pool <![CDATA[Accelerating Inference with Sparsity Using the NVIDIA Ampere Architecture and NVIDIA TensorRT]]> http://www.open-lab.net/blog/?p=34218 2023-06-12T21:09:10Z 2021-07-20T13:00:00Z

This post was updated July 20, 2021 to reflect NVIDIA TensorRT 8.0 updates. When deploying a neural network, it's useful to think about how the network could be...]]>

This post was updated July 20, 2021 to reflect NVIDIA TensorRT 8.0 updates. When deploying a neural network, it's useful to think about how the network could be...

inference-sparsity-ampere-tensorRT

This post was updated July 20, 2021 to reflect NVIDIA TensorRT 8.0 updates. Join the NVIDIA Triton and NVIDIA TensorRT community to stay current on the latest product updates, bug fixes, content, best practices, and more. When deploying a neural network, it��s useful to think about how the network could be made to run faster or take less space. A more efficient network can make better��

]]> 13 Richard Cowgill <![CDATA[Tips: Getting the Most out of the DLSS Unreal Engine 4 Plugin]]> http://www.open-lab.net/blog/?p=24048 2023-10-25T23:53:06Z 2021-02-17T19:00:29Z

DLSS is a deep learning, super-resolution network that boosts frame rates by rendering fewer pixels and then using AI to construct sharp, higher-resolution...]]>

DLSS is a deep learning, super-resolution network that boosts frame rates by rendering fewer pixels and then using AI to construct sharp, higher-resolution...

dlss-frame-rate-boost-deliver-us-the-moon

DLSS is a deep learning, super-resolution network that boosts frame rates by rendering fewer pixels and then using AI to construct sharp, higher-resolution images. Dedicated computational units on NVIDIA RTX GPUs called Tensor Cores accelerate the AI calculations, allowing the algorithm to run in real time. DLSS pairs perfectly with computationally intensive rendering algorithms such as real-time��

]]> 2 Dusan Stosic <![CDATA[Accelerating AI Training with NVIDIA TF32 Tensor Cores]]> http://www.open-lab.net/blog/?p=23724 2022-08-21T23:41:01Z 2021-01-27T23:09:58Z

NVIDIA Ampere GPU architecture introduced the third generation of Tensor Cores, with the new TensorFloat32 (TF32) mode for accelerating FP32 convolutions and...]]>

NVIDIA Ampere GPU architecture introduced the third generation of Tensor Cores, with the new TensorFloat32 (TF32) mode for accelerating FP32 convolutions and...

AI_training_TF32_tensor_cores_Featured_Image

NVIDIA Ampere GPU architecture introduced the third generation of Tensor Cores, with the new TensorFloat32 (TF32) mode for accelerating FP32 convolutions and matrix multiplications. TF32 mode is the default option for AI training with 32-bit variables on Ampere GPU architecture. It brings Tensor Core acceleration to single-precision DL workloads, without needing any changes to model scripts.

]]> 1 Brent Leback <![CDATA[Bringing Tensor Cores to Standard Fortran]]> http://www.open-lab.net/blog/?p=19380 2023-06-12T21:14:42Z 2020-08-07T19:35:38Z

Tuned math libraries are an easy and dependable way to extract the ultimate performance from your HPC system. However, for long-lived applications or those that...]]>

Tuned math libraries are an easy and dependable way to extract the ultimate performance from your HPC system. However, for long-lived applications or those that...

Tuned math libraries are an easy and dependable way to extract the ultimate performance from your HPC system. However, for long-lived applications or those that need to run on a variety of platforms, adapting library calls for each vendor or library version can be a maintenance nightmare. A compiler that can automatically generate calls to tuned math libraries gives you the best of both��

]]> 1 Vinh Nguyen <![CDATA[Accelerating TensorFlow on NVIDIA A100 GPUs]]> http://www.open-lab.net/blog/?p=18957 2023-06-12T21:15:05Z 2020-07-24T22:22:06Z

The NVIDIA A100, based on the NVIDIA Ampere GPU architecture, offers a suite of exciting new features: third-generation Tensor Cores, Multi-Instance GPU (MIG)...]]>

The NVIDIA A100, based on the NVIDIA Ampere GPU architecture, offers a suite of exciting new features: third-generation Tensor Cores, Multi-Instance GPU (MIG)...

tf_logo_social

The NVIDIA A100, based on the NVIDIA Ampere GPU architecture, offers a suite of exciting new features: third-generation Tensor Cores, Multi-Instance GPU (MIG) and third-generation NVLink. Ampere Tensor Cores introduce a novel math mode dedicated for AI training: the TensorFloat-32 (TF32). TF32 is designed to accelerate the processing of FP32 data types, commonly used in DL workloads.

]]> 0 Chris Campa <![CDATA[Defining AI Innovation with NVIDIA DGX A100]]> http://www.open-lab.net/blog/?p=17629 2023-03-22T01:06:28Z 2020-05-14T13:00:57Z

Organizations of all kinds are incorporating AI into their research, development, product, and business processes. This helps them meet and exceed their...]]>

Organizations of all kinds are incorporating AI into their research, development, product, and business processes. This helps them meet and exceed their...

DGX A100

Organizations of all kinds are incorporating AI into their research, development, product, and business processes. This helps them meet and exceed their particular goals, and also helps them gain experience and knowledge to take on even bigger challenges. However, traditional compute infrastructures aren��t suitable for AI due to slow CPU architectures and varying system requirements for different��

]]> 0 Ronny Krashinsky <![CDATA[NVIDIA Ampere Architecture In-Depth]]> http://www.open-lab.net/blog/?p=17431 2023-05-24T00:05:26Z 2020-05-14T13:00:00Z

Today, during the 2020 NVIDIA GTC keynote address, NVIDIA founder and CEO Jensen Huang introduced the new NVIDIA A100 GPU based on the new NVIDIA Ampere GPU...]]>

Today, during the 2020 NVIDIA GTC keynote address, NVIDIA founder and CEO Jensen Huang introduced the new NVIDIA A100 GPU based on the new NVIDIA Ampere GPU...

nvidia-a100-gpu-on-sxm4

Today, during the 2020 NVIDIA GTC keynote address, NVIDIA founder and CEO Jensen Huang introduced the new NVIDIA A100 GPU based on the new NVIDIA Ampere GPU architecture. This post gives you a look inside the new A100 GPU, and describes important new features of NVIDIA Ampere architecture GPUs. The diversity of compute-intensive applications running in modern cloud data centers has driven��

]]> 0 Micha? Marcinkiewicz <![CDATA[Accelerating Medical Image Segmentation with NVIDIA Tensor Cores and TensorFlow 2]]> http://www.open-lab.net/blog/?p=17253 2024-11-04T22:55:18Z 2020-05-09T20:20:15Z

Figure 1. Example of a serial section Transmission Electron Microscopy image (ssTEM) and its corresponding segmentation. Medical image segmentation is a hot...]]>

Figure 1. Example of a serial section Transmission Electron Microscopy image (ssTEM) and its corresponding segmentation. Medical image segmentation is a hot...

image4-1

Medical image segmentation is a hot topic in the deep learning community. Proof of that is the number of challenges, competitions, and research projects being conducted in this area, which only rises year over year. Among all the different approaches to this problem, U-Net has become the backbone of many of the top-performing solutions for both 2D and 3D segmentation tasks. This is due to its��

]]> 0 Dennis Sandler <![CDATA[Using Windows ML, ONNX, and NVIDIA Tensor Cores]]> http://www.open-lab.net/blog/?p=17158 2022-08-21T23:39:59Z 2020-04-28T21:53:53Z

As more and more deep learning models are being deployed into production environments, there is a growing need for a separation between the work on the model...]]>

As more and more deep learning models are being deployed into production environments, there is a growing need for a separation between the work on the model...

winml-architecture

As more and more deep learning models are being deployed into production environments, there is a growing need for a separation between the work on the model itself, and the work of integrating it into a production pipeline. Windows ML caters to this demand by addressing efficient deployment of pretrained deep learning models into Windows applications. Developing and training the model itself��

]]> 0 Houman Abbasian <![CDATA[Speeding Up Deep Learning Inference Using TensorRT]]> http://www.open-lab.net/blog/?p=17026 2022-10-10T18:51:44Z 2020-04-22T00:39:30Z

[stextbox id="info"]Looking for more? Check out the hands-on DLI training course: Optimization and Deployment of TensorFlow Models with TensorRT[/stextbox] This...]]>

[stextbox id="info"]Looking for more? Check out the hands-on DLI training course: Optimization and Deployment of TensorFlow Models with TensorRT[/stextbox] This...

]]> 5 Chris Hebert <![CDATA[Accelerating WinML and NVIDIA Tensor Cores]]> http://www.open-lab.net/blog/?p=16861 2022-08-21T23:39:54Z 2020-04-03T21:28:05Z

Figure 1. TensorCores. Every year, clever researchers introduce ever more complex and interesting deep learning models to the world. There is of course a big...]]>

Figure 1. TensorCores. Every year, clever researchers introduce ever more complex and interesting deep learning models to the world. There is of course a big...

Tensor_Cores

Every year, clever researchers introduce ever more complex and interesting deep learning models to the world. There is of course a big difference between a model that works as a nice demo in isolation and a model that performs a function within a production pipeline. This is particularly pertinent to creative apps where generative models must run with low latency to generate or enhance image��

]]> 0 Maggie Zhang <![CDATA[Generate Natural Sounding Speech from Text in Real-Time]]> http://www.open-lab.net/blog/?p=15579 2023-02-13T19:11:59Z 2019-09-10T22:56:08Z

Sign up for the latest Speech AI News from NVIDIA. This post, intended for developers with professional level understanding of deep learning, will help you...]]>

Sign up for the latest Speech AI News from NVIDIA. This post, intended for developers with professional level understanding of deep learning, will help you...

Tacotron 2 system arch

Sign up for the latest Speech AI News from NVIDIA. This post, intended for developers with professional level understanding of deep learning, will help you produce a production-ready, AI, text-to-speech model. Converting text into high quality, natural-sounding speech in real time has been a challenging conversational AI task for decades. State-of-the-art speech synthesis models are based on��

]]> 4 Purnendu Mukherjee <![CDATA[Real-Time Natural Language Understanding with BERT Using TensorRT]]> http://www.open-lab.net/blog/?p=15432 2022-10-10T18:51:43Z 2019-08-13T13:00:19Z

Large scale language models (LSLMs) such as BERT, GPT-2, and XL-Net have brought about exciting leaps in state-of-the-art accuracy for many natural language...]]>

Large scale language models (LSLMs) such as BERT, GPT-2, and XL-Net have brought about exciting leaps in state-of-the-art accuracy for many natural language...

Figure 6 Compute latency

Large scale language models (LSLMs) such as BERT, GPT-2, and XL-Net have brought about exciting leaps in state-of-the-art accuracy for many natural language understanding (NLU) tasks. Since its release in Oct 2018, BERT1 (Bidirectional Encoder Representations from Transformers) remains one of the most popular language models and still delivers state of the art accuracy at the time of writing2.

]]> 11 Gary Burnett <![CDATA[Object Detection on GPUs in 10 Minutes]]> http://www.open-lab.net/blog/?p=15047 2022-08-21T23:39:32Z 2019-06-26T19:00:39Z

Object detection remains the primary driver for applications such as autonomous driving and intelligent video analytics. Object detection applications require...]]>

Object detection remains the primary driver for applications such as autonomous driving and intelligent video analytics. Object detection applications require...

Object detection remains the primary driver for applications such as autonomous driving and intelligent video analytics. Object detection applications require substantial training using vast datasets to achieve high levels of accuracy. NVIDIA GPUs excel at the parallel compute performance required to train large networks in order to generate datasets for object detection inference.

]]> 8 Valerie Sarge <![CDATA[Tips for Optimizing GPU Performance Using Tensor Cores]]> http://www.open-lab.net/blog/?p=14687 2023-07-27T20:01:41Z 2019-06-10T13:00:06Z

Our most popular question is "What can I do to get great GPU performance for deep learning?"?We��ve recently published a detailed Deep Learning Performance...]]>

Our most popular question is "What can I do to get great GPU performance for deep learning?"?We��ve recently published a detailed Deep Learning Performance...

tensor_cube_white-1280-362x265

Our most popular question is ��What can I do to get great GPU performance for deep learning?�� We��ve recently published a detailed Deep Learning Performance Guide to help answer this question. The guide explains how GPUs process data and gives tips on how to design networks for better performance. We also take a close look at Tensor Core optimization to help improve performance. This post takes a��

]]> 15 Neil Trevett <![CDATA[Machine Learning Acceleration in Vulkan with Cooperative Matrices]]> http://www.open-lab.net/blog/?p=14322 2022-08-21T23:39:25Z 2019-04-16T21:00:10Z

Machine learning harnesses computing power to solve a variety of ��hard�� problems that seemed impossible to program using traditional languages and...]]>

Machine learning harnesses computing power to solve a variety of ��hard�� problems that seemed impossible to program using traditional languages and...

Machine learning harnesses computing power to solve a variety of ��hard�� problems that seemed impossible to program using traditional languages and techniques.?Machine learning?avoids?the need for a programmer to explicitly program the steps in solving a complex pattern-matching problem such as understanding speech or recognizing objects within an image. NVIDIA aims to bring machine learning to��

]]> 3 Brent Leback <![CDATA[Tensor Core Programming Using CUDA Fortran]]> http://www.open-lab.net/blog/?p=14140 2023-02-13T17:46:24Z 2019-04-02T13:00:36Z

The CUDA Fortran compiler from PGI now supports programming Tensor Cores with NVIDIA��s Volta V100 and Turing GPUs. This enables scientific programmers using...]]>

The CUDA Fortran compiler from PGI now supports programming Tensor Cores with NVIDIA��s Volta V100 and Turing GPUs. This enables scientific programmers using...

Cuda_Fortran_icon_green

The CUDA Fortran compiler from PGI now supports programming Tensor Cores with NVIDIA��s Volta V100 and Turing GPUs. This enables scientific programmers using Fortran to take advantage of FP16 matrix operations accelerated by Tensor Cores. Let��s take a look at how Fortran supports Tensor Cores. Tensor Cores offer substantial performance gains over typical CUDA GPU core programming on Tesla V100��

]]> 0 Amulya Vishwanath <![CDATA[Automatic Mixed Precision for NVIDIA Tensor Core Architecture in TensorFlow]]> http://www.open-lab.net/blog/?p=14054 2022-08-21T23:39:23Z 2019-03-18T22:19:17Z

Whether to employ mixed precision to train your TensorFlow models is no longer a tough decision.?NVIDIA��s?Automatic Mixed Precision (AMP) feature for...]]>

Whether to employ mixed precision to train your TensorFlow models is no longer a tough decision.?NVIDIA��s?Automatic Mixed Precision (AMP) feature for... Jetson Xavier Tensor Core Matrix

Jetson Xavier Tensor Core Matrix

Whether to employ mixed precision to train your TensorFlow models is no longer a tough decision. NVIDIA��s Automatic Mixed Precision (AMP) feature for TensorFlow, recently announced at the 2019 GTC, features automatic mixed precision training by making all the required model and optimizer adjustments internally within TensorFlow with minimal programmer intervention.

]]> 5 Amulya Vishwanath <![CDATA[Video Series: Mixed-Precision Training Techniques Using Tensor Cores for Deep Learning]]> http://www.open-lab.net/blog/?p=13416 2022-08-21T23:39:19Z 2019-01-30T18:00:34Z

Neural networks with thousands of layers and millions of neurons demand high performance and faster training times. The?complexity and size of neural networks...]]>

Neural networks with thousands of layers and millions of neurons demand high performance and faster training times. The?complexity and size of neural networks...

Neural networks with thousands of layers and millions of neurons demand high performance and faster training times. The complexity and size of neural networks continue to grow. Mixed-precision training using Tensor Cores on Volta and Turing architectures enable higher performance while maintaining network accuracy for heavily compute- and memory-intensive Deep Neural Networks (DNNs).

]]> 0 Geetika Gupta <![CDATA[Using Tensor Cores for Mixed-Precision Scientific Computing]]> http://www.open-lab.net/blog/?p=13346 2022-08-21T23:39:18Z 2019-01-23T14:00:44Z

Double-precision floating point?(FP64) has been the de facto standard for doing scientific simulation for several decades. Most numerical methods used in...]]>

Double-precision floating point?(FP64) has been the de facto standard for doing scientific simulation for several decades. Most numerical methods used in...

Double-precision floating point (FP64) has been the de facto standard for doing scientific simulation for several decades. Most numerical methods used in engineering and scientific applications require the extra precision to compute correct answers or even reach an answer. However, FP64 also requires more computing resources and runtime to deliver the increased precision levels.

]]> 2 Emmett Kilgariff <![CDATA[NVIDIA Turing Architecture In-Depth]]> http://www.open-lab.net/blog/?p=11872 2023-10-25T23:54:23Z 2018-09-14T13:15:10Z

Fueled by the ongoing growth of the gaming market and its insatiable demand for better 3D graphics, NVIDIA? has evolved the GPU into the world��s leading...]]>

Fueled by the ongoing growth of the gaming market and its insatiable demand for better 3D graphics, NVIDIA? has evolved the GPU into the world��s leading...

Fueled by the ongoing growth of the gaming market and its insatiable demand for better 3D graphics, NVIDIA? has evolved the GPU into the world��s leading parallel processing engine for many computationally-intensive applications. In addition to rendering highly realistic and immersive 3D games, NVIDIA GPUs also accelerate content creation workflows, high performance computing (HPC) and datacenter��

]]> 12 Shiva Pentyala <![CDATA[Mixed-Precision ResNet-50 Using Tensor Cores with TensorFlow]]> http://www.open-lab.net/blog/?p=11599 2022-08-21T23:39:01Z 2018-08-28T16:55:47Z

Mixed-Precision combines different numerical precisions in a computational method. Using precision lower than FP32 reduces memory usage, allowing deployment of...]]>

Mixed-Precision combines different numerical precisions in a computational method. Using precision lower than FP32 reduces memory usage, allowing deployment of... Figure 1: The Tesla V100 Accelerator with Volta GV100 GPU. SXM2 Form Factor.

Figure 1: The Tesla V100 Accelerator with Volta GV100 GPU. SXM2 Form Factor.

Mixed-Precision combines different numerical precisions in a computational method. Using precision lower than FP32 reduces memory usage, allowing deployment of larger neural networks. Data transfers take less time, and compute performance increases, especially on NVIDIA GPUs with Tensor Core support for that precision. Mixed-precision training of DNNs achieves two main objectives: This video��

]]> 2 Scott Yokim <![CDATA[Tensor Ops Made Easier in cuDNN]]> http://www.open-lab.net/blog/?p=11502 2022-08-21T23:38:58Z 2018-08-20T21:00:23Z

Neural network models have quickly taken advantage of NVIDIA Tensor Cores for deep learning since their introduction in the Tesla V100 GPU last year. For...]]>

Neural network models have quickly taken advantage of NVIDIA Tensor Cores for deep learning since their introduction in the Tesla V100 GPU last year. For...

cuDNN_logo_white_on_black_1432x920

Neural network models have quickly taken advantage of NVIDIA Tensor Cores for deep learning since their introduction in the Tesla V100 GPU last year. For example, new performance records for ResNet50 training were announced recently with Tensor Core-based solutions. (See the NVIDIA developer post on new performance milestones for additional details). NVIDIA��s cuDNN library enables CUDA��

]]> 1 Siddharth Sharma <![CDATA[TensorRT 4 Accelerates Neural Machine Translation, Recommenders, and Speech]]> http://www.open-lab.net/blog/?p=10726 2023-03-14T19:00:22Z 2018-06-19T13:00:45Z

NVIDIA has released TensorRT?4 at CVPR 2018. This new version of TensorRT, NVIDIA��s powerful inference optimizer and runtime engine provides: New Recurrent...]]>

NVIDIA has released TensorRT?4 at CVPR 2018. This new version of TensorRT, NVIDIA��s powerful inference optimizer and runtime engine provides: New Recurrent...

onnx2

NVIDIA has released TensorRT 4 at CVPR 2018. This new version of TensorRT, NVIDIA��s powerful inference optimizer and runtime engine provides: Additional features include the ability to execute custom neural network layers using FP16 precision and support for the Xavier SoC through NVIDIA DRIVE AI platforms. TensorRT 4 speeds up deep learning inference applications such as neural machine��

]]> 0 Loyd Case <![CDATA[Volta Tensor Core GPU Achieves New AI Performance Milestones]]> http://www.open-lab.net/blog/?p=10357 2023-02-13T18:53:22Z 2018-05-07T13:00:05Z

Artificial intelligence powered by deep learning now solves challenges once thought impossible, such as computers understanding and conversing in natural speech...]]>

Artificial intelligence powered by deep learning now solves challenges once thought impossible, such as computers understanding and conversing in natural speech...

small 2018-05-04 DGX-2_02d

Artificial intelligence powered by deep learning now solves challenges once thought impossible, such as computers understanding and conversing in natural speech and autonomous driving. Inspired by the effectiveness of deep learning to solve a great many challenges, the exponentially growing complexity of algorithms has resulted in a voracious appetite for faster computing.

]]> 10 Jeremy Appleyard <![CDATA[Programming Tensor Cores in CUDA 9]]> http://www.open-lab.net/blog/parallelforall/?p=8496 2024-05-17T17:25:34Z 2017-10-17T09:29:09Z

A defining feature of the new NVIDIA Volta GPU architecture is Tensor Cores, which give the NVIDIA V100 accelerator a peak throughput that is 12x...]]>

A defining feature of the new NVIDIA Volta GPU architecture is Tensor Cores, which give the NVIDIA V100 accelerator a peak throughput that is 12x... Decorative image of Tensor Cores.

Decorative image of Tensor Cores.

A defining feature of the new NVIDIA Volta GPU architecture is Tensor Cores, which give the NVIDIA V100 accelerator a peak throughput that is 12x the 32-bit floating point throughput of the previous-generation NVIDIA P100. Tensor Cores enable you to use mixed-precision for higher throughput without sacrificing accuracy. Tensor Cores provide a huge boost to convolutions and matrix operations.

]]> 14 Paulius Micikevicius <![CDATA[Mixed-Precision Training of Deep Neural Networks]]> http://www.open-lab.net/blog/parallelforall/?p=8452 2022-08-21T23:38:30Z 2017-10-11T16:00:57Z

Deep?Neural Networks (DNNs) have lead to breakthroughs in a number of areas, including image processing and understanding, language modeling, language...]]>

Deep?Neural Networks (DNNs) have lead to breakthroughs in a number of areas, including image processing and understanding, language modeling, language... CUDA AI Cube

CUDA AI Cube

Deep Neural Networks (DNNs) have lead to breakthroughs in a number of areas, including image processing and understanding, language modeling, language translation, speech processing, game playing, and many others. DNN complexity has been increasing to achieve these results, which in turn has increased the computational resources required to train these networks. Mixed-precision training lowers the��

]]> 5 Mark Harris <![CDATA[CUDA 9 Features Revealed: Volta, Cooperative Groups and More]]> http://www.open-lab.net/blog/parallelforall/?p=7874 2023-02-13T18:15:06Z 2017-05-11T07:18:30Z

[caption id="attachment_7875" align="alignright" width="200"] Figure 1: CUDA 9 provides a preview API for programming Tesla V100 Tensor Cores, providing a huge...]]>

[caption id="attachment_7875" align="alignright" width="200"] Figure 1: CUDA 9 provides a preview API for programming Tesla V100 Tensor Cores, providing a huge...

CUDA_Cube_1K

At the 2017 GPU Technology Conference NVIDIA announced CUDA 9, the latest version of CUDA��s powerful parallel computing platform and programming model. CUDA 9 is now available as a free download. In this post I��ll provide an overview of the awesome new features of CUDA 9. The CUDA Toolkit version 9.0 is available as a free download. To learn more you can watch the recording of my talk from GTC��

]]> 45 ��˳��97caoporen��