Stephen Jones, a leading expert and distinguished NVIDIA CUDA architect, offers his guidance and insights with a deep dive into the complexities of mapping applications onto massively parallel machines. Going beyond the basics to explore the intricacies of GPU programming, he focuses on practical techniques such as parallel program design and specific details of GPU optimization for improving the��
]]>GPUs are specially designed to crunch through massive amounts of data at high speed. They have a large amount of compute resources, called streaming multiprocessors (SMs), and an array of facilities to keep them fed with data: high bandwidth to memory, sizable data caches, and the capability to switch to other teams of workers (warps) without any overhead if an active team has run out of data.
]]>Differentiable Slang easily integrates with existing codebases��from Python, PyTorch, and CUDA to HLSL��to aid multiple computer graphics tasks and enable novel data-driven and neural research. In this post, we introduce several code examples using differentiable Slang to demonstrate the potential use across different rendering applications and the ease of integration. This is part of a series��
]]>NVIDIA just released a SIGGRAPH Asia 2023 research paper, SLANG.D: Fast, Modular and Differentiable Shader Programming. The paper shows how a single language can serve as a unified platform for real-time, inverse, and differentiable rendering. The work is a collaboration between MIT, UCSD, UW, and NVIDIA researchers. This is part of a series on Differentiable Slang. For more information about��
]]>On July 26, connect with NVIDIA CUDA product team experts on the latest CUDA Toolkit 12.
]]>The latest release of CUDA Toolkit 12.2 introduces a range of essential new features, modifications to the programming model, and enhanced support for hardware capabilities accelerating CUDA applications. Now out through general availability from NVIDIA, CUDA Toolkit 12.2 includes many new capabilities, both major and minor. The following post offers an overview of many of the key��
]]>CUDA Toolkit 12.0 introduces a new nvJitLink library for Just-in-Time Link Time Optimization (JIT LTO) support. In the early days of CUDA, to get maximum performance, developers had to build and compile CUDA kernels as a single source file in whole programming mode. This limited SDKs and applications with large swaths of code, spanning multiple files that required separate compilation from porting��
]]>To best ensure the security and reliability of our RPM and Debian package repositories, NVIDIA is updating and rotating the signing keys used by the , , and package managers beginning April 27, 2022. If you don��t update your repository signing keys, expect package management errors when attempting to access or install packages from CUDA repositories. To ensure continued access to the��
]]>Back in 2012, NVIDIAN Mark Harris wrote Six Ways to Saxpy, demonstrating how to perform the SAXPY operation on a GPU in multiple ways, using different languages and libraries. Since then, programming paradigms have evolved and so has the NVIDIA HPC SDK. In this post, I demonstrate five ways to implement a simple SAXPY computation using NVIDIA GPUs. Why is this interesting?
]]>Fortran developers have long been able to accelerate their programs using CUDA Fortran or OpenACC. For more up-to-date information, please read Using Fortran Standard Parallel Programming for GPU Acceleration, which aims to instruct developers on the advantages of using parallelism in standard languages for accelerated computing. Now with the latest 20.11 release of the NVIDIA HPC SDK��
]]>Historically, accelerating your C++ code with GPUs has not been possible in Standard C++ without using language extensions or additional libraries: In many cases, the results of these ports are worth the effort. But what if you could get the same effect without that cost? What if you could take your Standard C++ code and accelerate on a GPU? Now you can!
]]>Julia is a high-level programming language for mathematical computing that is as easy to use as Python, but as fast as C. The language has been created with performance in mind, and combines careful language design with a sophisticated LLVM-based compiler [Bezanson et al. 2017]. Julia is already well regarded for programming multicore CPUs and large parallel computing systems��
]]>You may already know NVIDIA Tesla as a line of GPU accelerator boards optimized for high-performance, general-purpose computing. They are used for parallel scientific, engineering, and technical computing, and they are designed for deployment in supercomputers, clusters, and workstations. But it��s not just the GPU boards that make Tesla a great computing solution. The combination of the world��s��
]]>