Jonathan Bentz – NVIDIA Technical Blog News and tutorials for developers, data scientists, and IT admins 2025-07-02T20:43:26Z http://www.open-lab.net/blog/feed/ Jonathan Bentz <![CDATA[Advanced NVIDIA CUDA Kernel Optimization Techniques: Handwritten PTX]]> http://www.open-lab.net/blog/?p=102881 2025-07-02T20:43:26Z 2025-07-02T20:43:19Z As accelerated computing continues to drive application performance in all areas of AI and scientific computing, there's a renewed interest in GPU optimization...]]>

As accelerated computing continues to drive application performance in all areas of AI and scientific computing, there’s a renewed interest in GPU optimization techniques to ensure applications obtain the best possible performance. As an application developer, there are many ways to program GPUs, up and down the software stack. In this post, we introduce some of the different levels of the stack…

Source

]]>
Jonathan Bentz <![CDATA[Just Released: CUDA 12.9]]> http://www.open-lab.net/blog/?p=99599 2025-05-15T19:07:49Z 2025-05-05T15:39:54Z New features include enhancements to confidential computing and family-specific features and targets supported by NVCC.?]]>

New features include enhancements to confidential computing and family-specific features and targets supported by NVCC.

Source

]]>
Jonathan Bentz <![CDATA[NVIDIA Blackwell and NVIDIA CUDA 12.9 Introduce Family-Specific Architecture Features]]> http://www.open-lab.net/blog/?p=98753 2025-05-15T19:08:27Z 2025-05-01T22:39:39Z One of the earliest architectural design decisions that went into the CUDA platform for NVIDIA GPUs was support for backward compatibility of GPU code. This...]]>

Source

]]>
Jonathan Bentz <![CDATA[NVIDIA cuPyNumeric 25.03 Now Fully Open Source with PIP and HDF5 Support]]> http://www.open-lab.net/blog/?p=99089 2025-05-15T19:08:44Z 2025-04-23T19:26:07Z NVIDIA cuPyNumeric is a library that aims to provide a distributed and accelerated drop-in replacement for NumPy built on top of the Legate framework. It brings...]]>

NVIDIA cuPyNumeric is a library that aims to provide a distributed and accelerated drop-in replacement for NumPy built on top of the Legate framework. It brings zero-code-change scaling to multi-GPU and multinode (MGMN) accelerated computing. cuPyNumeric 25.03 is a milestone update that introduces powerful new capabilities and enhanced accessibility for users and developers alike…

Source

]]>
Jonathan Bentz <![CDATA[Understanding PTX, the Assembly Language of CUDA GPU Computing]]> http://www.open-lab.net/blog/?p=96891 2025-04-23T00:32:55Z 2025-03-12T18:00:00Z Parallel thread execution (PTX) is a virtual machine instruction set architecture that has been part of CUDA from its beginning. You can think of PTX as the...]]>

Parallel thread execution (PTX) is a virtual machine instruction set architecture that has been part of CUDA from its beginning. You can think of PTX as the assembly language of the NVIDIA CUDA GPU computing platform. In this post, we’ll explain what that means, what PTX is for, and what you need to know about it to make the most of CUDA for your applications. We’ll start by walking through…

Source

]]>
Jonathan Bentz <![CDATA[Optimizing Compile Times for CUDA C++]]> http://www.open-lab.net/blog/?p=96775 2025-04-23T00:36:07Z 2025-03-10T18:02:27Z In modern software development, time is an incredibly valuable resource, especially during the compilation process. For developers working with CUDA C++ on...]]>

In modern software development, time is an incredibly valuable resource, especially during the compilation process. For developers working with CUDA C++ on large-scale GPU-accelerated applications, optimizing compile times can significantly enhance productivity and streamline the entire development cycle. When using the compiler for offline compilation, efficient compilation times enable…

Source

]]>
Jonathan Bentz <![CDATA[CUDA Toolkit Now Available for NVIDIA Blackwell?]]> http://www.open-lab.net/blog/?p=95358 2025-04-23T14:58:16Z 2025-01-31T19:17:12Z The latest release of the CUDA Toolkit, version 12.8, continues to push accelerated computing performance in data sciences, AI, scientific computing, and...]]>

The latest release of the CUDA Toolkit, version 12.8, continues to push accelerated computing performance in data sciences, AI, scientific computing, and computer graphics and simulation, using the latest NVIDIA CPUs and GPUs. This post highlights some of the new features and enhancements included with this release: CUDA Toolkit 12.8 is the first version of the Toolkit to support…

Source

]]>
���˳���97caoporen����