As ray tracing becomes the predominant rendering technique in modern game engines, a single GPU RayGen shader can now perform most of the light simulation of a frame. To manage this level of complexity, it becomes necessary to observe a decomposition of shader performance at the HLSL or GLSL source-code level. As a result, shader profilers are now a must-have tool for optimizing ray tracing.
]]>GPU-driven rendering has long been a major goal for many game applications. It enables better scalability for handling large virtual scenes and reduces cases where the CPU could bottleneck a game��s performance. Short of running the game��s logic on the GPU, I see the pinnacle of GPU-driven rendering as a scenario in which the CPU sends the GPU only the new frame��s camera information��
]]>When it comes to game application performance, GPU-driven rendering enables better scalability for handling large virtual scenes. Direct3D 12 (D3D12) introduces work graphs as a programming paradigm that enables the GPU to generate work for itself on the fly. For an introduction to work graphs, see Advancing GPU-Driven Rendering with Work Graphs in Direct3D 12. This post features a Direct3D��
]]>This post was updated on April 17, 2024. For developers working on ray tracing applications for both DirectX 12 and Vulkan, ray tracing validation is here to help you improve performance, find hard-to-debug issues, and root-cause crashes. Unlike existing debug solutions, ray tracing validation performs checks at the driver level, which enables it to identify potential problems that��
]]>Swap chains are an integral part of how you get rendering data output to a screen. They usually consist of some group of output-ready buffers, each of which can be rendered to one at a time in rotation. In parallel with rendering to one of a swap chain��s buffers, some other buffer in the swap chain is generally read from for display output. This post covers best practices when working with��
]]>Intrinsics can be thought of as higher-level abstractions of specific hardware instructions. They offer direct access to low-level operations or hardware-specific features, enabling increased performance. In this way, operations can be performed across threads within a warp, also known as a wavefront. The following code example is an example with SM6��
]]>There are some useful intrinsic functions in the NVIDIA GPU instruction set that are not included in standard graphics APIs. Updated from the original 2016 post to add information about new intrinsics and cross-vendor APIs in DirectX and Vulkan. For example, a shader can use warp shuffle instructions to exchange data between threads in a warp without going through shared memory��
]]>By using descriptor types, you can bind resources to shaders and specify how those resources are accessed. This creates efficient communication between the CPU and GPU and enables shaders to access the necessary data during rendering.
]]>NVIDIA offers a large suite of tools for graphics debugging, including NVIDIA Nsight System for CPU debugging, and Nsight Graphics for GPU debugging. Nsight Aftermath is useful for analyzing crash dumps. Thanks to Patrick Neill, Jeffrey Kiel, Justin Kim, Andrew Allan, and Louis Bavoil for their help with this post.
]]>This post covers best practices when working with shaders on NVIDIA GPUs. To get a high and consistent frame rate in your applications, see all Advanced API Performance tips. Shaders play a critical role in graphics programming by enabling you to control various aspects of the rendering process. They run on the GPU and are responsible for manipulating vertices, pixels, and other data.
]]>Ray and path tracing algorithms construct light paths by starting at the camera or the light sources and intersecting rays with the scene geometry. As objects are hit, new secondary rays are generated on these surfaces to continue the paths. In theory, these secondary rays will not yield an intersection with the same triangle again, as intersections at a distance of zero are excluded by the��
]]>This post covers best practices when working with pipeline state objects on NVIDIA GPUs. To get a high and consistent frame rate in your applications, see all Advanced API Performance tips. Pipeline state objects (PSOs) define how input data is interpreted and rendered by the hardware when submitting work to the GPUs. Proper management of PSOs is essential for optimal usage of system��
]]>If you are a DirectX 12 (DX12) game developer, you may have noticed that GPU times displayed in real time in your game HUD may change over time for a given pass. This may be the case even if nothing has changed on the application side. One reason for GPU time variations may be GPU Boost dynamically changing the GPU core clock frequency. Still, even with GPU Boost disabled using the DX12��
]]>This post is part of a series about optimizing end-to-end AI. While NVIDIA hardware can process the individual operations that constitute a neural network incredibly fast, it is important to ensure that you are using the tools correctly. Using the respective tools such as ONNX Runtime or TensorRT out of the box with ONNX usually gives you good performance, but why settle for good performance��
]]>Learn how AI is boosting creative applications for creators during NVIDIA GTC 2023, March 20-23.
]]>This post is the first in a series about optimizing end-to-end AI. The great thing about the GPU is that it offers tremendous parallelism; it allows you to perform many tasks at the same time. At its most granular level, this comes down to the fact that there are thousands of tiny processing cores that run the same instruction at the same time. But that is not where such parallelism stops.
]]>Load times. They are the bane of any developer trying to construct a seamless experience. Trying to hide loading in a game by forcing a player to shimmy through narrow passages or take extremely slow elevators breaks immersion. Now, developers have a better solution. NVIDIA collaborated with Microsoft and IHV partners to develop GDeflate for DirectStorage 1.1, an open standard for GPU��
]]>Swap groups and swap barriers are well-known methods to synchronize buffer swaps between different windows on the same system and on distributed systems, respectively. Initially introduced for OpenGL, they were later extended through public NvAPI interfaces and supported in DirectX 9 through 12. NVIDIA now introduces the concept of present barriers. They combine swap groups and swap barriers��
]]>NVIDIA was fortunate enough to speak with 3D prop and environment artist Daniel Martinger, who captured the attention of the computer graphics world with his stunningly realistic path-traced rendered scene entitled ��The Carpenter��s Cellar.�� Below, Martinger discusses how he created this piece of work using an NVIDIA RTX 3090 and Unreal Engine 5, along with where he finds inspiration.
]]>This post was originally published on the Developer Zone. Depth precision is a pain that every graphics programmer has to struggle with sooner or later. Many articles and papers have been written on the topic, and a variety of different depth buffer formats and setups are found across different games, engines, and devices. Because of the way it interacts with perspective projection��
]]>Modern graphics APIs, such as Direct3D 12 and Vulkan, are designed to provide relatively low-level access to the GPU and eliminate the GPU driver overhead associated with API translation. This low-level interface allows applications to have more control over the system and provides the ability to manage pipelines, shader compilation, memory allocations, and resource descriptors in a way that is��
]]>Designing rich content and graphics for VR experiences means creating complex materials and high-resolution textures. But rendering all that content at VR resolutions and frame rates can be challenging, especially when rendering at the highest quality. You can address this challenge by using variable rate shading (VRS) to focus shader resources on certain parts of an image��specifically��
]]>Mesh shaders were introduced with the Turing architecture and are shipping with Ampere as well. In this post, I offer a detailed look over mesh shader experiences for these hardware architectures so far. The context of these results was primarily CAD and DCC viewport or VR-centric. However, some of it may be applicable to games as well, which increase in geometric complexity.
]]>To get the most out of DirectX 12 Ultimate, we��ve provided early public access to the NVIDIA DirectX Ultimate Developer Preview Driver [450.82] on our DirextX developer page, for both NVIDIA GeForce and NVIDIA Quadro. This new driver will let you go hands-on with DirectX 12 Ultimate��s exciting new features: DXR traces paths of light with physics calculations��
]]>GPU performance events can be used to instrument your game by labeling regions and marking important occurrences. A performance event represents a logical, hierarchical grouping of work, consisting of a begin/end marker pair. There are best practices for GPU performance events that are universally used by profiling tools such as NVIDIA Nsight Graphics and NVIDIA Nsight Systems��
]]>For information about VRSS 2, see Delivering Dynamic Foveated Rendering with NVIDIA VRSS 2. The Virtual Reality (VR) industry is in the midst of a new hardware cycle �C higher resolution headsets and better optics being the key focus points for the device manufacturers. Similarly on the software front, there has been a wave of content-rich applications and an emphasis on flawless VR��
]]>Figuring out how to reduce the GPU frame time of a rendering application on PC is challenging for even the most experienced PC game developers. In this blog post, we describe a performance triage method we��ve been using internally at NVIDIA to let us figure out the main performance limiters of any given GPU workload (also known as perf marker or call range), using NVIDIA-specific hardware metrics.
]]>Epic Games is adding ��Early Access�� support for ray tracing through the DirectX Raytracing API (DXR) to Unreal Engine with the pending release of Unreal Engine 4.22. Demos dating back to GDC 2018 show impressive ray tracing results using DXR. However, UE 4.22 integrates ray tracing support into the mainline branch, making ray tracing available to the wider world. While 4.22 is an early release��
]]>RTX is NVIDIA��s new platform for hybrid rendering, allowing the combination of rasterization and compute-based techniques with hardware-accelerated ray tracing and deep learning. It has already been adopted in a number of games and engines. Based on those experiences, this blog aims to give the reader an insight into how RTX ray tracing is best integrated into real-time applications today.
]]>RTX introduces an exciting and fundamental shift in the way lighting systems work in games and applications. In this video series, NVIDIA Engineers Martin-Karl Lefrancois and Pascal Gautron help you get started with real-time ray tracing. You��ll learn how data and rendering is managed, how acceleration structures and shaders work, and what new components are needed for your pipeline.
]]>Ray tracing will soon revolutionize the way video games look. Ray tracing simulates how rays of light hit and bounce off of objects, enabling developers to create stunning imagery that lives up to the word ��photorealistic��. Ignacio Llamas and Edward Liu from NVIDIA��s real-time rendering software team will introduce you to real-time ray tracing in this series of seven short videos.
]]>The NVIDIA SMP (simultaneous multi-projection) Assist NVAPI driver extension is a simple method for integrating Multi-Res Shading and Lens-Matched Shading into a VR application. It encapsulates a notable amount of state setup and API calls into a simplified API, thereby substantially reducing the complexity of integrating NVIDIA VRWorks into an application. Specifically, the SMP Assist driver��
]]>When writing compute shaders, it��s often necessary to communicate values between threads. This is typically done through shared memory. Kepler GPUs introduced shuffle intrinsics, which enable threads of a warp to directly read each other��s registers, avoiding memory access and synchronization. Shared memory is relatively fast but instructions that operate without using memory of any kind are��
]]>