Mike Murphy – NVIDIA Technical Blog

Mike Murphy – NVIDIA Technical Blog News and tutorials for developers, data scientists, and IT admins 2024-06-27T18:17:56Z http://www.open-lab.net/blog/feed/ Mike Murphy <![CDATA[Runtime Fatbin Creation Using the NVIDIA CUDA Toolkit 12.4 Compiler]]> http://www.open-lab.net/blog/?p=83992 2024-06-27T18:17:56Z 2024-06-18T17:28:55Z

CUDA Toolkit 12.4 introduced a new nvFatbin library for creating fatbins at runtime. Fatbins, otherwise known as NVIDIA device code fat binaries, are containers...]]>

]]> 1 Mike Murphy <![CDATA[CUDA 12.0 Compiler Support for Runtime LTO Using nvJitLink Library]]> http://www.open-lab.net/blog/?p=59762 2023-06-12T08:12:19Z 2023-01-17T22:40:43Z

CUDA Toolkit 12.0 introduces a new nvJitLink library for Just-in-Time Link Time Optimization (JIT LTO) support. In the early days of CUDA, to get maximum...]]>

CUDA Toolkit 12.0 introduces a new nvJitLink library for Just-in-Time Link Time Optimization (JIT LTO) support. In the early days of CUDA, to get maximum performance, developers had to build and compile CUDA kernels as a single source file in whole programming mode. This limited SDKs and applications with large swaths of code, spanning multiple files that required separate compilation from porting…

]]> 6 Mike Murphy <![CDATA[Reducing Application Build Times Using CUDA C++ Compilation Aids]]> http://www.open-lab.net/blog/?p=38989 2022-08-21T23:52:55Z 2021-10-26T05:04:15Z

The CUDA 11.5 C++ compiler addresses a growing customer request. Specifically, how to reduce CUDA application build times. Along with eliminating unused...]]>

]]> 1 Mike Murphy <![CDATA[Programming Efficiently with the NVIDIA CUDA 11.3 Compiler Toolchain]]> http://www.open-lab.net/blog/?p=29901 2023-12-30T00:42:34Z 2021-04-16T00:40:00Z

The CUDA 11.3 release of the CUDA C++ compiler toolchain incorporates new features aimed at improving developer productivity and code performance. NVIDIA is...]]>

The CUDA 11.3 release of the CUDA C++ compiler toolchain incorporates new features aimed at improving developer productivity and code performance. NVIDIA is introducing cu++flt, a standalone demangler tool that allows you to decode mangled function names to aid source code correlation. Starting with this release, the NVRTC shared library versioning scheme is relaxed to facilitate compatible…

]]> 2 Mike Murphy <![CDATA[Boosting Productivity and Performance with the NVIDIA CUDA 11.2 C++ Compiler]]> http://www.open-lab.net/blog/?p=23916 2022-08-21T23:41:02Z 2021-02-13T02:30:28Z

The 11.2 CUDA C++ compiler incorporates features and enhancements aimed at improving developer productivity and the performance of GPU-accelerated applications....]]>

The 11.2 CUDA C++ compiler incorporates features and enhancements aimed at improving developer productivity and the performance of GPU-accelerated applications. The compiler toolchain gets an LLVM upgrade to 7.0, which enables new features and can help improve compiler code generation for NVIDIA GPUs. Link-time optimization (LTO) for device code (also known as device LTO)…

]]> 0 Mike Murphy <![CDATA[Improving GPU Application Performance with NVIDIA CUDA 11.2 Device Link Time Optimization]]> http://www.open-lab.net/blog/?p=23930 2022-08-21T23:41:02Z 2021-02-13T01:27:00Z

CUDA 11.2 features the powerful link time optimization (LTO) feature for device code in GPU-accelerated applications. Device LTO brings the performance...]]>

CUDA 11.2 features the powerful link time optimization (LTO) feature for device code in GPU-accelerated applications. Device LTO brings the performance advantages of device code optimization that were only possible in the whole program compilation mode to the separate compilation mode, which was introduced in CUDA 5.0. Separate compilation mode allows CUDA device kernel code to span across…

]]> 16 Mike Murphy <![CDATA[Separate Compilation and Linking of CUDA C++ Device Code]]> http://www.open-lab.net/blog/parallelforall/?p=2522 2022-08-21T23:37:02Z 2014-04-22T15:03:27Z

Managing complexity in?large programs requires breaking them down into components that are responsible for small, well-defined portions of the overall program....]]>

]]> 39 ��˳��97caoporen��