Louis Bavoil – NVIDIA Technical Blog

Louis Bavoil – NVIDIA Technical Blog News and tutorials for developers, data scientists, and IT admins 2025-05-29T19:05:01Z http://www.open-lab.net/blog/feed/ Louis Bavoil <![CDATA[Path Tracing Optimization in Indiana Jones?: Shader Execution Reordering and Live State Reductions]]> http://www.open-lab.net/blog/?p=98587 2025-05-29T19:05:01Z 2025-05-15T15:30:00Z

This post is part of the Path Tracing Optimizations in Indiana Jones? series. While adding a path-tracing mode to Indiana Jones and the Great Circle?...]]>

This post is part of the Path Tracing Optimizations in Indiana Jones series. While adding a path-tracing mode to Indiana Jones and the Great Circle in 2024, we used Shader Execution Reordering (SER), a feature available on NVIDIA GPUs since the NVIDIA GeForce RTX 40 Series, to improve the GPU performance. To optimize the use of SER in the main path-tracing pass (), we used the NVIDIA…

]]> Louis Bavoil <![CDATA[Path Tracing Optimizations in Indiana Jones?: Opacity MicroMaps and Compaction of Dynamic BLASs]]> http://www.open-lab.net/blog/?p=98909 2025-05-29T19:05:00Z 2025-05-15T15:30:00Z

The first post in this series, Path Tracing Optimization in Indiana Jones?: Shader Execution Reordering and Live State Reductions, covered ray-gen shader...]]>

The first post in this series, Path Tracing Optimization in Indiana Jones™: Shader Execution Reordering and Live State Reductions, covered ray-gen shader level optimizations that sped up the main path-tracing pass (“TraceMain”) of Indiana Jones and the Great Circle™. This second blog post covers additional GPU optimizations that were made at the level of the ray-tracing acceleration…

]]> Louis Bavoil <![CDATA[Powerful Shader Insights: Using Shader Debug Info with NVIDIA Nsight Graphics]]> http://www.open-lab.net/blog/?p=79026 2024-12-09T16:54:30Z 2024-03-14T20:00:00Z

As ray tracing becomes the predominant rendering technique in modern game engines, a single GPU RayGen shader can now perform most of the light simulation of a...]]>

As ray tracing becomes the predominant rendering technique in modern game engines, a single GPU RayGen shader can now perform most of the light simulation of a frame. To manage this level of complexity, it becomes necessary to observe a decomposition of shader performance at the HLSL or GLSL source-code level. As a result, shader profilers are now a must-have tool for optimizing ray tracing.

]]> Louis Bavoil <![CDATA[In-Game GPU Profiling for DirectX 12 Using SetBackgroundProcessingMode]]> http://www.open-lab.net/blog/?p=67605 2023-10-25T23:52:36Z 2023-07-10T17:00:00Z

If you are a DirectX 12 (DX12) game developer, you may have noticed that GPU times displayed in real time in your game HUD may change over time for a given...]]>

If you are a DirectX 12 (DX12) game developer, you may have noticed that GPU times displayed in real time in your game HUD may change over time for a given pass. This may be the case even if nothing has changed on the application side. One reason for GPU time variations may be GPU Boost dynamically changing the GPU core clock frequency. Still, even with GPU Boost disabled using the DX12…

]]> 0 Louis Bavoil <![CDATA[Identifying Shader Limiters with the Shader Profiler in NVIDIA Nsight Graphics]]> http://www.open-lab.net/blog/?p=46703 2024-08-28T18:12:30Z 2022-04-26T00:18:24Z

UPDATE: NVIDIA Nsight Graphics 2023.3 and later feature the new Real-Time Shader Profiler, the first temporal sampling profiler for GPU shaders. This profiler...]]>

UPDATE: NVIDIA Nsight Graphics 2023.3 and later feature the new Real-Time Shader Profiler, the first temporal sampling profiler for GPU shaders. This profiler enables you to examine the most expensive shaders at each moment in your frame. For more information, see GPU Trace UI in the Nsight Graphics User Guide. A less well-known but cool feature of NVIDIA Nsight Graphics is the Shader…

]]> 1 Louis Bavoil <![CDATA[Optimizing DX12 Resource Uploads to the GPU Using GPU Upload Heaps]]> http://www.open-lab.net/blog/?p=35247 2024-08-28T18:18:02Z 2021-08-11T22:00:12Z

This post was updated on May 19, 2023. How to optimize DX12 resource uploads from the CPU to the GPU over the PCIe bus is an old problem with many possible...]]>

This post was updated on May 19, 2023. How to optimize DX12 resource uploads from the CPU to the GPU over the PCIe bus is an old problem with many possible solutions, each with their pros and cons. In this post, I show how moving cherry-picked DX12 UPLOAD heaps to GPU upload heaps can be a simple solution to speed up PCIe-limited workloads. Take the example of a vertex buffer (VB)…

]]> 1 Louis Bavoil <![CDATA[Optimizing Compute Shaders for L2 Locality using Thread-Group ID Swizzling]]> http://www.open-lab.net/blog/?p=18921 2022-08-21T23:40:23Z 2020-07-16T21:04:01Z

As part of my GDC 2019 session, Optimizing DX12/DXR GPU Workloads using Nsight Graphics: GPU Trace and the Peak-Performance-Percentage (P3) Method, I presented...]]>

As part of my GDC 2019 session, Optimizing DX12/DXR GPU Workloads using Nsight Graphics: GPU Trace and the Peak-Performance-Percentage (P3) Method, I presented an optimization technique named thread-group tiling, a type of thread-group ID swizzling. This is an important technique for optimizing 2D, full-screen, compute shader passes that are doing widely spread texture fetches and which are…

]]> 1 Louis Bavoil <![CDATA[Using Nsight Systems for Fixing Stutters in Games]]> http://www.open-lab.net/blog/?p=16824 2024-08-28T18:24:49Z 2020-04-03T18:57:17Z

While working with game developers on pre-release games, NVIDIA has had a steady flow of bugs reported where a game stutters for multiple milliseconds during...]]>

While working with game developers on pre-release games, NVIDIA has had a steady flow of bugs reported where a game stutters for multiple milliseconds during gameplay. These stutter bugs can ruin the experience of the gamer, possibly making the game unplayable (as with the release of Batman Arkham Knight on PC), so they should be treated with a high priority. Until 2018, the only tool that…

]]> 0 Louis Bavoil <![CDATA[Optimizing VK/VKR and DX12/DXR Applications Using Nsight Graphics: GPU Trace Advanced Mode Metrics]]> http://www.open-lab.net/blog/?p=16816 2024-08-28T18:24:55Z 2020-03-30T20:51:41Z

Many GPU performance analysis tools are based on a capture and replay mechanism, where a frame is first captured (either in-memory or to disk), and then...]]>

Many GPU performance analysis tools are based on a capture and replay mechanism, where a frame is first captured (either in-memory or to disk), and then replayed multiple times to be profiled. Nsight Graphics: GPU Trace differs in that it directly profiles the frames emitted by a live application, with no constraint on subsequent frames to be identical. This approach makes the tool simpler than…

]]> 0 Louis Bavoil <![CDATA[The Peak-Performance-Percentage Analysis Method for Optimizing Any GPU Workload]]> http://www.open-lab.net/blog/?p=12004 2024-08-28T18:26:12Z 2019-06-21T17:33:49Z

Figuring out how to reduce the GPU frame time of a rendering application on PC is challenging for even the most experienced PC game developers. In this blog...]]>

Figuring out how to reduce the GPU frame time of a rendering application on PC is challenging for even the most experienced PC game developers. In this blog post, we describe a performance triage method we’ve been using internally at NVIDIA to let us figure out the main performance limiters of any given GPU workload (also known as perf marker or call range), using NVIDIA-specific hardware metrics.

]]> 6 ��˳��97caoporen��