As accelerated computing continues to drive application performance in all areas of AI and scientific computing, there’s a renewed interest in GPU optimization techniques to ensure applications obtain the best possible performance. As an application developer, there are many ways to program GPUs, up and down the software stack. In this post, we introduce some of the different levels of the stack…
]]>Parallel thread execution (PTX) is a virtual machine instruction set architecture that has been part of CUDA from its beginning. You can think of PTX as the assembly language of the NVIDIA CUDA GPU computing platform. In this post, we’ll explain what that means, what PTX is for, and what you need to know about it to make the most of CUDA for your applications. We’ll start by walking through…
]]>