Tony Scudiero – NVIDIA Technical Blog News and tutorials for developers, data scientists, and IT admins 2025-07-10T18:29:56Z http://www.open-lab.net/blog/feed/ Tony Scudiero <![CDATA[Advanced NVIDIA CUDA Kernel Optimization Techniques: Handwritten PTX]]> http://www.open-lab.net/blog/?p=102881 2025-07-10T18:29:56Z 2025-07-02T20:43:19Z As accelerated computing continues to drive application performance in all areas of AI and scientific computing, there's a renewed interest in GPU optimization...]]>

As accelerated computing continues to drive application performance in all areas of AI and scientific computing, there’s a renewed interest in GPU optimization techniques to ensure applications obtain the best possible performance. As an application developer, there are many ways to program GPUs, up and down the software stack. In this post, we introduce some of the different levels of the stack…

Source

]]>
Tony Scudiero <![CDATA[NVIDIA Blackwell and NVIDIA CUDA 12.9 Introduce Family-Specific Architecture Features]]> http://www.open-lab.net/blog/?p=98753 2025-05-15T19:08:27Z 2025-05-01T22:39:39Z One of the earliest architectural design decisions that went into the CUDA platform for NVIDIA GPUs was support for backward compatibility of GPU code. This...]]>

Source

]]>
Tony Scudiero <![CDATA[Understanding PTX, the Assembly Language of CUDA GPU Computing]]> http://www.open-lab.net/blog/?p=96891 2025-04-23T00:32:55Z 2025-03-12T18:00:00Z Parallel thread execution (PTX) is a virtual machine instruction set architecture that has been part of CUDA from its beginning. You can think of PTX as the...]]>

Parallel thread execution (PTX) is a virtual machine instruction set architecture that has been part of CUDA from its beginning. You can think of PTX as the assembly language of the NVIDIA CUDA GPU computing platform. In this post, we’ll explain what that means, what PTX is for, and what you need to know about it to make the most of CUDA for your applications. We’ll start by walking through…

Source

]]>
Tony Scudiero <![CDATA[Separate Compilation and Linking of CUDA C++ Device Code]]> http://www.open-lab.net/blog/parallelforall/?p=2522 2022-08-21T23:37:02Z 2014-04-22T15:03:27Z Managing complexity in?large programs requires breaking them down into components that are responsible for small, well-defined portions of the overall program....]]>

Source

]]>
39
���˳���97caoporen����