Volta – NVIDIA Technical Blog News and tutorials for developers, data scientists, and IT admins 2025-07-09T19:00:00Z http://www.open-lab.net/blog/feed/ Deepak Unnikrishnan <![CDATA[CUDA 12.1 Supports Large Kernel Parameters]]> http://www.open-lab.net/blog/?p=66058 2024-08-28T17:39:46Z 2023-06-05T17:00:00Z CUDA kernel function parameters are passed to the device through constant memory and have been limited to 4,096 bytes. CUDA 12.1 increases this parameter limit...]]> CUDA kernel function parameters are passed to the device through constant memory and have been limited to 4,096 bytes. CUDA 12.1 increases this parameter limit...Abstract image

Source

]]>
4
Micha? Szo?ucha <![CDATA[Case Study: ResNet50 with DALI]]> http://www.open-lab.net/blog/?p=15089 2023-07-05T19:40:54Z 2019-07-02T13:00:47Z Let��s imagine a situation. You buy a brand-new, cutting-edge, Volta-powered DGX-2 server. You��ve done your math right, expecting a 2x performance increase...]]> Let��s imagine a situation. You buy a brand-new, cutting-edge, Volta-powered DGX-2 server. You��ve done your math right, expecting a 2x performance increase...

Let��s imagine a situation. You buy a brand-new, cutting-edge, Volta-powered DGX-2 server. You��ve done your math right, expecting a 2x performance increase in ResNet50 training over the DGX-1 you had before. You plug it into your rack cabinet and run the training. That��s when an unpleasant surprise pops up. Even though your math is correct, the speedup you��re getting lower than expected. Why?

Source

]]>
0
Neil Trevett <![CDATA[Machine Learning Acceleration in Vulkan with Cooperative Matrices]]> http://www.open-lab.net/blog/?p=14322 2022-08-21T23:39:25Z 2019-04-16T21:00:10Z Machine learning harnesses computing power to solve a variety of ��hard�� problems that seemed impossible to program using traditional languages and...]]> Machine learning harnesses computing power to solve a variety of ��hard�� problems that seemed impossible to program using traditional languages and...

Machine learning harnesses computing power to solve a variety of ��hard�� problems that seemed impossible to program using traditional languages and techniques.?Machine learning?avoids?the need for a programmer to explicitly program the steps in solving a complex pattern-matching problem such as understanding speech or recognizing objects within an image. NVIDIA aims to bring machine learning to��

Source

]]>
3
Brent Leback <![CDATA[Tensor Core Programming Using CUDA Fortran]]> http://www.open-lab.net/blog/?p=14140 2023-02-13T17:46:24Z 2019-04-02T13:00:36Z The CUDA Fortran compiler from PGI now supports programming Tensor Cores with NVIDIA��s Volta V100 and Turing GPUs. This enables scientific programmers using...]]> The CUDA Fortran compiler from PGI now supports programming Tensor Cores with NVIDIA��s Volta V100 and Turing GPUs. This enables scientific programmers using...

The CUDA Fortran compiler from PGI now supports programming Tensor Cores with NVIDIA��s Volta V100 and Turing GPUs. This enables scientific programmers using Fortran to take advantage of FP16 matrix operations accelerated by Tensor Cores. Let��s take a look at how Fortran supports Tensor Cores. Tensor Cores offer substantial performance gains over typical CUDA GPU core programming on Tesla V100��

Source

]]>
0
Bruce Tannenbaum <![CDATA[Speeding Up Semantic Segmentation Using MATLAB Container from NVIDIA NGC]]> http://www.open-lab.net/blog/?p=13730 2023-02-13T17:38:47Z 2019-03-13T14:00:24Z Gone are the days of using a single GPU to train a deep learning model. ?With computationally intensive algorithms such as semantic segmentation, a single GPU...]]> Gone are the days of using a single GPU to train a deep learning model. ?With computationally intensive algorithms such as semantic segmentation, a single GPU...

Gone are the days of using a single GPU to train a deep learning model. With computationally intensive algorithms such as semantic segmentation, a single GPU can take days to optimize a model. But multi-GPU hardware is expensive, you say. Not any longer; NVIDIA multi-GPU hardware on cloud instances like the AWS P3 allow you to pay for only what you use. Cloud instances allow you to take��

Source

]]>
0
Amulya Vishwanath <![CDATA[Video Series: Mixed-Precision Training Techniques Using Tensor Cores for Deep Learning]]> http://www.open-lab.net/blog/?p=13416 2022-08-21T23:39:19Z 2019-01-30T18:00:34Z Neural networks with thousands of layers and millions of neurons demand high performance and faster training times. The?complexity and size of neural networks...]]> Neural networks with thousands of layers and millions of neurons demand high performance and faster training times. The?complexity and size of neural networks...

Neural networks with thousands of layers and millions of neurons demand high performance and faster training times. The complexity and size of neural networks continue to grow. Mixed-precision training using Tensor Cores on Volta and Turing architectures enable higher performance while maintaining network accuracy for heavily compute- and memory-intensive Deep Neural Networks (DNNs).

Source

]]>
0
Geetika Gupta <![CDATA[Using Tensor Cores for Mixed-Precision Scientific Computing]]> http://www.open-lab.net/blog/?p=13346 2022-08-21T23:39:18Z 2019-01-23T14:00:44Z Double-precision floating point?(FP64) has been the de facto standard for doing scientific simulation for several decades. Most numerical methods used in...]]> Double-precision floating point?(FP64) has been the de facto standard for doing scientific simulation for several decades. Most numerical methods used in...

Double-precision floating point (FP64) has been the de facto standard for doing scientific simulation for several decades. Most numerical methods used in engineering and scientific applications require the extra precision to compute correct answers or even reach an answer. However, FP64 also requires more computing resources and runtime to deliver the increased precision levels.

Source

]]>
2
Olivier Giroux <![CDATA[CUDA on Turing Opens New GPU Compute Possibilities]]> http://www.open-lab.net/blog/?p=12703 2023-02-13T17:35:16Z 2018-11-07T14:00:31Z The Turing architecture?introduces so many cool new features that it��s easy to miss the quiet revolution in GPU programming that it also represents: all of...]]> The Turing architecture?introduces so many cool new features that it��s easy to miss the quiet revolution in GPU programming that it also represents: all of...

Source

]]>
1
Robert Sohigian <![CDATA[NVSwitch Accelerates NVIDIA DGX-2]]> http://www.open-lab.net/blog/?p=11571 2022-08-21T23:39:00Z 2018-08-21T16:15:04Z NVIDIA CEO Jensen Huang described the NVIDIA??DGX-2™ server as "the world's largest GPU" at its launch during GPU Technology Conference earlier this...]]> NVIDIA CEO Jensen Huang described the NVIDIA??DGX-2™ server as "the world's largest GPU" at its launch during GPU Technology Conference earlier this...

NVIDIA CEO Jensen Huang described the NVIDIA? DGX-2 server as ��the world��s largest GPU�� at its launch during GPU Technology Conference earlier this year. DGX-2 comprises 16 NVIDIA Tesla V100 32 GB GPUs and other top-drawer components (two 24 core Xeon CPUs, 1.5 TB of DDR4 DRAM memory, and 30 TB of NVMe storage) in a single system, delivering two petaFLOPS of performance, qualifying it as one of��

Source

]]>
3
Nefi Alarcon <![CDATA[Introducing Apex: PyTorch Extension with Tools to Realize the Power of Tensor Cores]]> https://news.www.open-lab.net/?p=10617 2022-08-21T23:45:26Z 2018-06-19T12:59:17Z Today at the Computer Vision and Pattern Recognition Conference in Salt Lake City, Utah, NVIDIA is kicking off the conference by demonstrating an early release...]]> Today at the Computer Vision and Pattern Recognition Conference in Salt Lake City, Utah, NVIDIA is kicking off the conference by demonstrating an early release...

Today at the Computer Vision and Pattern Recognition Conference in Salt Lake City, Utah, NVIDIA is kicking off the conference by demonstrating an early release of Apex, an open-source PyTorch extension that helps users maximize deep learning training performance on NVIDIA Volta GPUs. Inspired by state-of-the-art mixed precision training in translational networks, sentiment analysis��

Source

]]>
0
Fernanda Foertter <![CDATA[Summit GPU Supercomputer Enables Smarter Science]]> http://www.open-lab.net/blog/?p=10628 2022-08-21T23:38:54Z 2018-06-08T16:55:34Z Today the world of open science received its greatest asset in the form of the Summit supercomputer at Oak Ridge National Laboratory (ORNL). This represents an...]]> Today the world of open science received its greatest asset in the form of the Summit supercomputer at Oak Ridge National Laboratory (ORNL). This represents an...

Today the world of open science received its greatest asset in the form of the Summit supercomputer at Oak Ridge National Laboratory (ORNL). This represents an historic milestone because it is the world��s first supercomputer fusing high performance, data-intensive, and AI computing into one system. Summit is capable of delivering a peak 200 petaflops, ten times faster than its Titan predecessor��

Source

]]>
3
Nefi Alarcon <![CDATA[A Trio of New Nsight Tools That Empower Developers to Fully Optimize their CPU and GPU Performance]]> https://news.www.open-lab.net/?p=10493 2024-08-28T18:22:50Z 2018-06-01T01:01:29Z Three big NVIDIA Nsight releases on the same day! NSight Systems is a brand new optimization tool; Nsight Visual Studio Edition 5.6 extends support to Volta...]]> Three big NVIDIA Nsight releases on the same day! NSight Systems is a brand new optimization tool; Nsight Visual Studio Edition 5.6 extends support to Volta...

Three big NVIDIA Nsight releases on the same day! NSight Systems is a brand new optimization tool; Nsight Visual Studio Edition 5.6 extends support to Volta GPUs and Win10 RS4; and NSight GRAPHICS 1.2 replaces the current Linux Graphics Debugger. NVIDIA Nsight Systems is a low overhead performance analysis tool designed to provide insights developers need to optimize their software.

Source

]]>
0
Nefi Alarcon <![CDATA[OpenSeq2Seq: New Toolkit for Distributed and Mixed-Precision Training of Sequence-to-Sequence Models]]> https://news.www.open-lab.net/?p=9902 2022-08-21T23:45:00Z 2018-04-21T19:36:00Z Researchers at NVIDIA open-sourced v0.2 of OpenSeq2Seq - a?new toolkit built on top of TensorFlow for training sequence-to-sequence models. OpenSeq2Seq...]]> Researchers at NVIDIA open-sourced v0.2 of OpenSeq2Seq - a?new toolkit built on top of TensorFlow for training sequence-to-sequence models. OpenSeq2Seq...

Researchers at NVIDIA open-sourced v0.2 of OpenSeq2Seq �C a new toolkit built on top of TensorFlow for training sequence-to-sequence models. OpenSeq2Seq provides researchers with optimized implementation of various sequence-to-sequence models commonly used for applications such as machine translation and speech recognition. OpenSeq2Seq is performance optimized for mixed-precision training using��

Source

]]>
0
Brad Nemire <![CDATA[Nsight Visual Studio Edition 5.5 Introduces Graphics Pixel History, Next-Gen CUDA GPU+CPU debugging, Next-Gen CUDA Profiling, and now supports Volta GPUs, Win10 RS3, and CUDA 9.1]]> https://news.www.open-lab.net/?p=10706 2023-10-25T23:54:27Z 2018-01-03T17:52:40Z NVIDIA Nsight Visual Studio Edition 5.5 is now available for download in the NVIDIA Registered Developer Program. This release extends support to the latest...]]> NVIDIA Nsight Visual Studio Edition 5.5 is now available for download in the NVIDIA Registered Developer Program. This release extends support to the latest...

NVIDIA Nsight Visual Studio Edition 5.5 is now available for download in the NVIDIA Registered Developer Program. This release extends support to the latest Volta GPUs and Win10 RS3. The Graphics Debugger adds Pixel History (DirectX 11, OpenGL) and OpenVR 1.0.10 support as well as Vulkan and Range Profiler improvements. Nsight Visual Studio Edition version 5.5 also introduces new compute tools��

Source

]]>
0
Shashank Prasanna <![CDATA[TensorRT 3: Faster TensorFlow Inference and Volta Support]]> http://www.open-lab.net/blog/parallelforall/?p=8664 2022-08-21T23:38:34Z 2017-12-04T17:00:59Z NVIDIA TensorRT™ is a high-performance deep learning inference optimizer and runtime that delivers low latency, high-throughput inference for deep...]]> NVIDIA TensorRT™ is a high-performance deep learning inference optimizer and runtime that delivers low latency, high-throughput inference for deep...CUDA AI hero image

NVIDIA TensorRT is a high-performance deep learning inference optimizer and runtime that delivers low latency, high-throughput inference for deep learning applications. NVIDIA released TensorRT last year with the goal of accelerating deep learning inference for production deployment. In this post we��ll introduce TensorRT 3, which improves performance versus previous versions and includes new��

Source

]]>
16
Nikolay Sakharnykh <![CDATA[Maximizing Unified Memory Performance in CUDA]]> http://www.open-lab.net/blog/parallelforall/?p=8603 2022-08-21T23:38:33Z 2017-11-20T03:37:53Z Many of today's applications process large volumes of data. While GPU architectures have very fast HBM or GDDR memory, they have limited capacity. Making the...]]> Many of today's applications process large volumes of data. While GPU architectures have very fast HBM or GDDR memory, they have limited capacity. Making the...Unified Memory

Many of today��s applications process large volumes of data. While GPU architectures have very fast HBM or GDDR memory, they have limited capacity. Making the most of GPU performance requires the data to be as close to the GPU as possible. This is especially important for applications that iterate over the same data multiple times or have a high flops/byte ratio. Many real-world codes have to��

Source

]]>
18
Jeremy Appleyard <![CDATA[Programming Tensor Cores in CUDA 9]]> http://www.open-lab.net/blog/parallelforall/?p=8496 2024-05-17T17:25:34Z 2017-10-17T09:29:09Z A defining feature of the new NVIDIA Volta GPU architecture is Tensor Cores, which give the NVIDIA V100 accelerator a peak throughput that is 12x...]]> A defining feature of the new NVIDIA Volta GPU architecture is Tensor Cores, which give the NVIDIA V100 accelerator a peak throughput that is 12x...Decorative image of Tensor Cores.

A defining feature of the new NVIDIA Volta GPU architecture is Tensor Cores, which give the NVIDIA V100 accelerator a peak throughput that is 12x the 32-bit floating point throughput of the previous-generation NVIDIA P100. Tensor Cores enable you to use mixed-precision for higher throughput without sacrificing accuracy. Tensor Cores provide a huge boost to convolutions and matrix operations.

Source

]]>
14
Paulius Micikevicius <![CDATA[Mixed-Precision Training of Deep Neural Networks]]> http://www.open-lab.net/blog/parallelforall/?p=8452 2022-08-21T23:38:30Z 2017-10-11T16:00:57Z Deep?Neural Networks (DNNs) have lead to breakthroughs in a number of areas, including image processing and understanding, language modeling, language...]]> Deep?Neural Networks (DNNs) have lead to breakthroughs in a number of areas, including image processing and understanding, language modeling, language...CUDA AI Cube

Deep Neural Networks (DNNs) have lead to breakthroughs in a number of areas, including image processing and understanding, language modeling, language translation, speech processing, game playing, and many others. DNN complexity has been increasing to achieve these results, which in turn has increased the computational resources required to train these networks. Mixed-precision training lowers the��

Source

]]>
5
Adam Grzywaczewski <![CDATA[Training AI for Self-Driving Vehicles:?the Challenge of Scale]]> http://www.open-lab.net/blog/parallelforall/?p=8435 2022-08-21T23:38:28Z 2017-10-10T01:24:38Z Modern deep neural networks, such as those used in self-driving vehicles, require a mind boggling?amount of?computational power. Today a single computer, like...]]> Modern deep neural networks, such as those used in self-driving vehicles, require a mind boggling?amount of?computational power. Today a single computer, like...

Modern deep neural networks, such as those used in self-driving vehicles, require a mind boggling amount of computational power. Today a single computer, like NVIDIA DGX-1, can achieve computational performance on par with the world��s biggest supercomputers in the year 2010 (��Top 500��, 2010). Even though this technological advance is unprecedented, it is being dwarfed by the computational hunger��

Source

]]>
0
Brad Nemire <![CDATA[Microsoft Releases New Version of High-Performance, Open-Source, Deep Learning Toolkit]]> https://news.www.open-lab.net/?p=8488 2022-08-21T23:43:33Z 2017-06-01T18:49:35Z Previously known as CNTK, the Microsoft Cognitive Toolkit version 2.0 allows developers to create, train, and evaluate their own neural networks that can scale...]]> Previously known as CNTK, the Microsoft Cognitive Toolkit version 2.0 allows developers to create, train, and evaluate their own neural networks that can scale...

Previously known as CNTK, the Microsoft Cognitive Toolkit version 2.0 allows developers to create, train, and evaluate their own neural networks that can scale across multiple GPUs and multiple machines on massive data sets. The open-source toolkit, available on GitHub, offers hundreds of new features, performance improvements and fixes that have been added since the beta version of CNTK.

Source

]]>
0
���˳���97caoporen����