Programming Languages / Compilers – NVIDIA Technical Blog

Programming Languages / Compilers – NVIDIA Technical Blog News and tutorials for developers, data scientists, and IT admins 2025-07-03T22:20:47Z http://www.open-lab.net/blog/feed/ Michelle Horton <![CDATA[Advanced Strategies for High-Performance GPU Programming with NVIDIA CUDA]]> http://www.open-lab.net/blog/?p=88069 2024-09-19T19:31:59Z 2024-09-11T16:25:00Z

Stephen Jones, a leading expert and distinguished NVIDIA CUDA architect, offers his guidance and insights with a deep dive into the complexities of mapping...]]>

Stephen Jones, a leading expert and distinguished NVIDIA CUDA architect, offers his guidance and insights with a deep dive into the complexities of mapping... An illustration representing CUDA.

An illustration representing CUDA.

Stephen Jones, a leading expert and distinguished NVIDIA CUDA architect, offers his guidance and insights with a deep dive into the complexities of mapping applications onto massively parallel machines. Going beyond the basics to explore the intricacies of GPU programming, he focuses on practical techniques such as parallel program design and specific details of GPU optimization for improving the��

]]> 1 Rob Van der Wijngaart <![CDATA[Improving GPU Performance by Reducing Instruction Cache Misses]]> http://www.open-lab.net/blog/?p=86868 2025-01-22T17:57:59Z 2024-08-08T16:30:00Z

GPUs are specially designed to crunch through massive amounts of data at high speed. They have a large amount of compute resources, called streaming...]]>

GPUs are specially designed to crunch through massive amounts of data at high speed. They have a large amount of compute resources, called streaming... Decorative image of light fields in green, purple, and blue.

Decorative image of light fields in green, purple, and blue.

GPUs are specially designed to crunch through massive amounts of data at high speed. They have a large amount of compute resources, called streaming multiprocessors (SMs), and an array of facilities to keep them fed with data: high bandwidth to memory, sizable data caches, and the capability to switch to other teams of workers (warps) without any overhead if an active team has run out of data.

]]> 6 Sai Bangaru <![CDATA[Differentiable Slang: Example Applications]]> http://www.open-lab.net/blog/?p=72018 2023-11-02T20:23:30Z 2023-10-23T04:03:00Z

Differentiable Slang easily integrates with existing codebases��from Python, PyTorch, and CUDA to HLSL��to aid multiple computer graphics tasks and enable...]]>

Differentiable Slang easily integrates with existing codebases��from Python, PyTorch, and CUDA to HLSL��to aid multiple computer graphics tasks and enable... Decorative image of green transparent cube with tiered white lights inside.

Decorative image of green transparent cube with tiered white lights inside.

Differentiable Slang easily integrates with existing codebases��from Python, PyTorch, and CUDA to HLSL��to aid multiple computer graphics tasks and enable novel data-driven and neural research. In this post, we introduce several code examples using differentiable Slang to demonstrate the potential use across different rendering applications and the ease of integration. This is part of a series��

]]> 0 Sai Bangaru <![CDATA[Differentiable Slang: A Shading Language for Renderers That Learn]]> http://www.open-lab.net/blog/?p=72011 2023-11-02T20:23:44Z 2023-10-23T04:02:00Z

NVIDIA just released a SIGGRAPH Asia 2023 research paper, SLANG.D: Fast, Modular and Differentiable Shader Programming. The paper shows how a single language...]]>

NVIDIA just released a SIGGRAPH Asia 2023 research paper, SLANG.D: Fast, Modular and Differentiable Shader Programming. The paper shows how a single language...

primal-and-derivative-zero-day

NVIDIA just released a SIGGRAPH Asia 2023 research paper, SLANG.D: Fast, Modular and Differentiable Shader Programming. The paper shows how a single language can serve as a unified platform for real-time, inverse, and differentiable rendering. The work is a collaboration between MIT, UCSD, UW, and NVIDIA researchers. This is part of a series on Differentiable Slang. For more information about��

]]> 0 Michelle Horton <![CDATA[Ask Me Anything: NVIDIA CUDA Toolkit 12]]> http://www.open-lab.net/blog/?p=68440 2023-08-10T17:11:18Z 2023-07-25T18:21:38Z

On July 26, connect with NVIDIA CUDA product team experts on the latest CUDA Toolkit 12.?]]>

On July 26, connect with NVIDIA CUDA product team experts on the latest CUDA Toolkit 12.? CUDA Toolkit AMA promo card.

CUDA Toolkit AMA promo card.

On July 26, connect with NVIDIA CUDA product team experts on the latest CUDA Toolkit 12.

]]> 0 Rob Armstrong <![CDATA[CUDA Toolkit 12.2 Unleashes Powerful Features for Boosting Applications]]> http://www.open-lab.net/blog/?p=67705 2024-08-28T17:39:00Z 2023-07-06T19:16:56Z

The latest release of CUDA Toolkit 12.2 introduces a range of essential new features, modifications to the programming model, and enhanced support for hardware...]]>

The latest release of CUDA Toolkit 12.2 introduces a range of essential new features, modifications to the programming model, and enhanced support for hardware... CUDA abstract image.

CUDA abstract image.

The latest release of CUDA Toolkit 12.2 introduces a range of essential new features, modifications to the programming model, and enhanced support for hardware capabilities accelerating CUDA applications. Now out through general availability from NVIDIA, CUDA Toolkit 12.2 includes many new capabilities, both major and minor. The following post offers an overview of many of the key��

]]> 0 Arthy Sundaram <![CDATA[CUDA 12.0 Compiler Support for Runtime LTO Using nvJitLink Library]]> http://www.open-lab.net/blog/?p=59762 2023-06-12T08:12:19Z 2023-01-17T22:40:43Z

CUDA Toolkit 12.0 introduces a new nvJitLink library for Just-in-Time Link Time Optimization (JIT LTO) support. In the early days of CUDA, to get maximum...]]>

CUDA Toolkit 12.0 introduces a new nvJitLink library for Just-in-Time Link Time Optimization (JIT LTO) support. In the early days of CUDA, to get maximum...

cuda-12.0-nvJitLink

CUDA Toolkit 12.0 introduces a new nvJitLink library for Just-in-Time Link Time Optimization (JIT LTO) support. In the early days of CUDA, to get maximum performance, developers had to build and compile CUDA kernels as a single source file in whole programming mode. This limited SDKs and applications with large swaths of code, spanning multiple files that required separate compilation from porting��

]]> 6 Rob Armstrong <![CDATA[Updating the CUDA Linux GPG Repository Key]]> http://www.open-lab.net/blog/?p=47291 2023-06-12T20:37:54Z 2022-04-28T18:36:29Z

To best ensure the security and reliability of our RPM and Debian package repositories, NVIDIA is updating and rotating the signing keys used by the apt,...]]>

To best ensure the security and reliability of our RPM and Debian package repositories, NVIDIA is updating and rotating the signing keys used by the apt,...

cuda-image-16x9

To best ensure the security and reliability of our RPM and Debian package repositories, NVIDIA is updating and rotating the signing keys used by the , , and package managers beginning April 27, 2022. If you don��t update your repository signing keys, expect package management errors when attempting to access or install packages from CUDA repositories. To ensure continued access to the��

]]> 70 Dhruv Singal <![CDATA[N Ways to SAXPY: Demonstrating the Breadth of GPU Programming Options]]> http://www.open-lab.net/blog/?p=25483 2023-02-13T17:23:38Z 2021-04-06T21:11:00Z

Back in 2012, NVIDIAN Mark Harris wrote Six Ways to Saxpy, demonstrating how to perform the SAXPY operation on a GPU in multiple ways, using different languages...]]>

Back in 2012, NVIDIAN Mark Harris wrote Six Ways to Saxpy, demonstrating how to perform the SAXPY operation on a GPU in multiple ways, using different languages...

SAXPY

Back in 2012, NVIDIAN Mark Harris wrote Six Ways to Saxpy, demonstrating how to perform the SAXPY operation on a GPU in multiple ways, using different languages and libraries. Since then, programming paradigms have evolved and so has the NVIDIA HPC SDK. In this post, I demonstrate five ways to implement a simple SAXPY computation using NVIDIA GPUs. Why is this interesting?

]]> 1 Guray Ozen <![CDATA[Accelerating Fortran DO CONCURRENT with GPUs and the NVIDIA HPC SDK]]> http://www.open-lab.net/blog/?p=22198 2023-06-12T21:13:52Z 2020-11-16T16:00:00Z

Fortran developers have long been able to accelerate their programs using CUDA Fortran or OpenACC. For more up-to-date information, please read Using Fortran...]]>

Fortran developers have long been able to accelerate their programs using CUDA Fortran or OpenACC. For more up-to-date information, please read Using Fortran...

Fortran Featured

Fortran developers have long been able to accelerate their programs using CUDA Fortran or OpenACC. For more up-to-date information, please read Using Fortran Standard Parallel Programming for GPU Acceleration, which aims to instruct developers on the advantages of using parallelism in standard languages for accelerated computing. Now with the latest 20.11 release of the NVIDIA HPC SDK��

]]> 28 David Olsen <![CDATA[Accelerating Standard C++ with GPUs Using stdpar]]> http://www.open-lab.net/blog/?p=18511 2023-12-05T23:58:18Z 2020-08-04T23:30:00Z

Historically, accelerating your C++ code with GPUs has not been possible in Standard C++ without using language extensions or additional libraries: CUDA C++...]]>

Historically, accelerating your C++ code with GPUs has not been possible in Standard C++ without using language extensions or additional libraries: CUDA C++... Standard Parallellism in C++

Standard Parallellism in C++

Historically, accelerating your C++ code with GPUs has not been possible in Standard C++ without using language extensions or additional libraries: In many cases, the results of these ports are worth the effort. But what if you could get the same effect without that cost? What if you could take your Standard C++ code and accelerate on a GPU? Now you can!

]]> 7 Tim Besard <![CDATA[High-Performance GPU Computing in the Julia Programming Language]]> http://www.open-lab.net/blog/parallelforall/?p=8555 2022-08-21T23:38:31Z 2017-10-26T05:09:59Z

Julia?is a high-level programming language for mathematical computing that is as easy to use as Python, but as fast as C. The language has been created with...]]>

Julia?is a high-level programming language for mathematical computing that is as easy to use as Python, but as fast as C. The language has been created with... Julia Programming Language

Julia Programming Language

Julia is a high-level programming language for mathematical computing that is as easy to use as Python, but as fast as C. The language has been created with performance in mind, and combines careful language design with a sophisticated LLVM-based compiler [Bezanson et al. 2017]. Julia is already well regarded for programming multicore CPUs and large parallel computing systems��

]]> 5 Mark Harris <![CDATA[12 Things You Should Know about the Tesla Accelerated Computing Platform]]> http://www.open-lab.net/blog/parallelforall/?p=4050 2025-05-01T18:34:24Z 2014-11-11T16:41:27Z

You may already know NVIDIA Tesla as a line of GPU accelerator boards optimized for high-performance, general-purpose computing. They are used for parallel...]]>

You may already know NVIDIA Tesla as a line of GPU accelerator boards optimized for high-performance, general-purpose computing. They are used for parallel...

tesla_platform_thumb

You may already know NVIDIA Tesla as a line of GPU accelerator boards optimized for high-performance, general-purpose computing. They are used for parallel scientific, engineering, and technical computing, and they are designed for deployment in supercomputers, clusters, and workstations. But it��s not just the GPU boards that make Tesla a great computing solution. The combination of the world��s��

]]> 5 Mark Harris <![CDATA[Six Ways to SAXPY]]> http://www.parallelforall.com/?p=40 2023-02-13T18:13:03Z 2012-07-02T11:03:25Z

For even more ways to SAXPY using the latest NVIDIA HPC SDK with standard language parallelism, see N Ways to SAXPY: Demonstrating the Breadth of GPU...]]>

For even more ways to SAXPY using the latest NVIDIA HPC SDK with standard language parallelism, see N Ways to SAXPY: Demonstrating the Breadth of GPU...

saxpy

]]> 17 ��˳��97caoporen��