Compilation – NVIDIA Technical Blog

Compilation – NVIDIA Technical Blog News and tutorials for developers, data scientists, and IT admins 2025-07-03T22:20:47Z http://www.open-lab.net/blog/feed/ Rob Armstrong <![CDATA[CUDA Toolkit 11.8 New Features Revealed]]> http://www.open-lab.net/blog/?p=55646 2024-08-28T17:44:48Z 2022-10-04T14:00:00Z

NVIDIA announces the newest CUDA Toolkit software release, 11.8. This release is focused on enhancing the programming model and CUDA application speedup through...]]>

NVIDIA announces the newest CUDA Toolkit software release, 11.8. This release is focused on enhancing the programming model and CUDA application speedup through...

cuda-image-16x9-1

NVIDIA announces the newest CUDA Toolkit software release, 11.8. This release is focused on enhancing the programming model and CUDA application speedup through new hardware capabilities. New architecture-specific features in NVIDIA Hopper and Ada Lovelace are initially being exposed through libraries and framework enhancements. The full programming model enhancements for the NVIDIA Hopper��

]]> 4 Dhruv Singal <![CDATA[N Ways to SAXPY: Demonstrating the Breadth of GPU Programming Options]]> http://www.open-lab.net/blog/?p=25483 2023-02-13T17:23:38Z 2021-04-06T21:11:00Z

Back in 2012, NVIDIAN Mark Harris wrote Six Ways to Saxpy, demonstrating how to perform the SAXPY operation on a GPU in multiple ways, using different languages...]]>

Back in 2012, NVIDIAN Mark Harris wrote Six Ways to Saxpy, demonstrating how to perform the SAXPY operation on a GPU in multiple ways, using different languages...

SAXPY

Back in 2012, NVIDIAN Mark Harris wrote Six Ways to Saxpy, demonstrating how to perform the SAXPY operation on a GPU in multiple ways, using different languages and libraries. Since then, programming paradigms have evolved and so has the NVIDIA HPC SDK. In this post, I demonstrate five ways to implement a simple SAXPY computation using NVIDIA GPUs. Why is this interesting?

]]> 1 Arthy Sundaram <![CDATA[Boosting Productivity and Performance with the NVIDIA CUDA 11.2 C++ Compiler]]> http://www.open-lab.net/blog/?p=23916 2022-08-21T23:41:02Z 2021-02-13T02:30:28Z

The 11.2 CUDA C++ compiler incorporates features and enhancements aimed at improving developer productivity and the performance of GPU-accelerated applications....]]>

The 11.2 CUDA C++ compiler incorporates features and enhancements aimed at improving developer productivity and the performance of GPU-accelerated applications....

CudaC++

The 11.2 CUDA C++ compiler incorporates features and enhancements aimed at improving developer productivity and the performance of GPU-accelerated applications. The compiler toolchain gets an LLVM upgrade to 7.0, which enables new features and can help improve compiler code generation for NVIDIA GPUs. Link-time optimization (LTO) for device code (also known as device LTO)��

]]> 0 Mike Murphy <![CDATA[Improving GPU Application Performance with NVIDIA CUDA 11.2 Device Link Time Optimization]]> http://www.open-lab.net/blog/?p=23930 2022-08-21T23:41:02Z 2021-02-13T01:27:00Z

CUDA 11.2 features the powerful link time optimization (LTO) feature for device code in GPU-accelerated applications. Device LTO brings the performance...]]>

CUDA 11.2 features the powerful link time optimization (LTO) feature for device code in GPU-accelerated applications. Device LTO brings the performance...

GPUapplication_Figure1

CUDA 11.2 features the powerful link time optimization (LTO) feature for device code in GPU-accelerated applications. Device LTO brings the performance advantages of device code optimization that were only possible in the whole program compilation mode to the separate compilation mode, which was introduced in CUDA 5.0. Separate compilation mode allows CUDA device kernel code to span across��

]]> 16 Ram Cherukuri <![CDATA[Enhancing Memory Allocation with New NVIDIA CUDA 11.2 Features]]> http://www.open-lab.net/blog/?p=22770 2024-08-28T17:54:37Z 2020-12-16T16:00:00Z

CUDA is the software development platform for building GPU-accelerated applications, providing all the components needed to develop applications targeting every...]]>

CUDA is the software development platform for building GPU-accelerated applications, providing all the components needed to develop applications targeting every...

CUDA_3x2

CUDA is the software development platform for building GPU-accelerated applications, providing all the components needed to develop applications targeting every NVIDIA GPU platform for general purpose compute acceleration. The latest CUDA release, CUDA 11.2, is focused on improving the user experience and application performance for CUDA developers. CUDA 11.2��

]]> 0 Michael Wolfe <![CDATA[Detecting Divergence Using PCAST to Compare GPU to CPU Results]]> http://www.open-lab.net/blog/?p=22165 2022-08-21T23:40:47Z 2020-11-18T16:00:00Z

Parallel Compiler Assisted Software Testing (PCAST) is a feature available in the NVIDIA HPC Fortran, C++, and C compilers. PCAST has two use cases. The first...]]>

Parallel Compiler Assisted Software Testing (PCAST) is a feature available in the NVIDIA HPC Fortran, C++, and C compilers. PCAST has two use cases. The first... PCAST helps to quickly isolate divergence between CPU and GPU results so you can isolate bugs or verify your results are OK even if they aren��t identical.

PCAST helps to quickly isolate divergence between CPU and GPU results so you can isolate bugs or verify your results are OK even if they aren��t identical.

Parallel Compiler Assisted Software Testing (PCAST) is a feature available in the NVIDIA HPC Fortran, C++, and C compilers. PCAST has two use cases. The first is testing changes to parts of a program, new compile-time flags, or a port to a new compiler or to a new processor. You might want to test whether a new library gives the same result, or test the safety of adding OpenMP parallelism��

]]> 0 Guray Ozen <![CDATA[Accelerating Fortran DO CONCURRENT with GPUs and the NVIDIA HPC SDK]]> http://www.open-lab.net/blog/?p=22198 2023-06-12T21:13:52Z 2020-11-16T16:00:00Z

Fortran developers have long been able to accelerate their programs using CUDA Fortran or OpenACC. For more up-to-date information, please read Using Fortran...]]>

Fortran developers have long been able to accelerate their programs using CUDA Fortran or OpenACC. For more up-to-date information, please read Using Fortran...

Fortran Featured

Fortran developers have long been able to accelerate their programs using CUDA Fortran or OpenACC. For more up-to-date information, please read Using Fortran Standard Parallel Programming for GPU Acceleration, which aims to instruct developers on the advantages of using parallelism in standard languages for accelerated computing. Now with the latest 20.11 release of the NVIDIA HPC SDK��

]]> 28 Rekha Mukund <![CDATA[NVDLA Deep Learning Inference Compiler is Now Open Source]]> http://www.open-lab.net/blog/?p=15610 2022-08-21T23:39:37Z 2019-09-11T16:00:25Z

Designing new custom hardware accelerators for deep learning is clearly popular, but achieving state-of-the-art performance and efficiency with a new design is...]]>

Designing new custom hardware accelerators for deep learning is clearly popular, but achieving state-of-the-art performance and efficiency with a new design is...

NVDLA

Designing new custom hardware accelerators for deep learning is clearly popular, but achieving state-of-the-art performance and efficiency with a new design is a complex and challenging problem. Two years ago, NVIDIA opened the source for the hardware design of the NVIDIA Deep Learning Accelerator (NVDLA) to help advance the adoption of efficient AI inferencing in custom hardware designs.

]]> 1 Tim Besard <![CDATA[High-Performance GPU Computing in the Julia Programming Language]]> http://www.open-lab.net/blog/parallelforall/?p=8555 2022-08-21T23:38:31Z 2017-10-26T05:09:59Z

Julia?is a high-level programming language for mathematical computing that is as easy to use as Python, but as fast as C. The language has been created with...]]>

Julia?is a high-level programming language for mathematical computing that is as easy to use as Python, but as fast as C. The language has been created with... Julia Programming Language

Julia Programming Language

Julia is a high-level programming language for mathematical computing that is as easy to use as Python, but as fast as C. The language has been created with performance in mind, and combines careful language design with a sophisticated LLVM-based compiler [Bezanson et al. 2017]. Julia is already well regarded for programming multicore CPUs and large parallel computing systems��

]]> 5 Robert Maynard <![CDATA[Building Cross-Platform CUDA Applications with CMake]]> http://www.open-lab.net/blog/parallelforall/?p=8202 2022-08-21T23:38:21Z 2017-08-01T17:14:26Z

Cross-platform software development poses a number of challenges to your application��s build process. How do you target multiple platforms without maintaining...]]>

Cross-platform software development poses a number of challenges to your application��s build process. How do you target multiple platforms without maintaining...

Cross-platform software development poses a number of challenges to your application��s build process. How do you target multiple platforms without maintaining multiple platform-specific build scripts, projects, or makefiles? What if you need to build CUDA code as part of the process? CMake is an open-source, cross-platform family of tools designed to build, test and package software across��

]]> 79 Jaydeep Marathe <![CDATA[New Compiler Features in CUDA 8]]> http://www.open-lab.net/blog/parallelforall/?p=7346 2022-08-21T23:38:01Z 2016-11-08T07:14:00Z

CUDA 8 is one of the most significant updates in the history of the CUDA platform. In addition to Unified Memory and the many new API and library features in...]]>

CUDA 8 is one of the most significant updates in the history of the CUDA platform. In addition to Unified Memory and the many new API and library features in...

CUDA_Cube_1K

]]> 3 Paresh Kharya <![CDATA[Introducing the NVIDIA OpenACC Toolkit]]> http://www.open-lab.net/blog/parallelforall/?p=5569 2022-11-28T18:20:54Z 2015-07-13T07:01:55Z

Programmability is crucial to accelerated computing, and NVIDIA's CUDA Toolkit has been critical to the success of GPU computing. Over three million CUDA...]]>

Programmability is crucial to accelerated computing, and NVIDIA's CUDA Toolkit has been critical to the success of GPU computing. Over three million CUDA...

Programmability is crucial to accelerated computing, and NVIDIA��s CUDA Toolkit has been critical to the success of GPU computing. Over three million CUDA Toolkits have been downloaded since its first launch. However, there are many scientists and researchers yet to benefit from GPU computing. These scientists have limited time to learn and apply a parallel programming language, and they often have��

]]> 2 Alex ?uhan <![CDATA[MapD: Massive Throughput Database Queries with LLVM on GPUs]]> http://www.open-lab.net/blog/parallelforall/?p=5464 2022-10-10T18:45:15Z 2015-06-23T10:38:38Z

Note: this post was co-written by Alex ?uhan and Todd Mostak of MapD. At MapD our goal is to build the world's fastest big data analytics and visualization...]]>

Note: this post was co-written by Alex ?uhan and Todd Mostak of MapD. At MapD our goal is to build the world's fastest big data analytics and visualization...

At MapD our goal is to build the world��s fastest big data analytics and visualization platform that enables lag-free interactive exploration of multi-billion row datasets. MapD supports standard SQL queries as well as a visualization API that maps OpenGL primitives onto SQL result sets. Although MapD is fast running on x86-64 CPUs, our real advantage stems from our ability to leverage the��

]]> 17 Tim Ellison <![CDATA[The Next Wave of Enterprise Performance with Java, POWER Systems, and NVIDIA GPUs]]> http://www.open-lab.net/blog/parallelforall/?p=3947 2023-12-29T21:46:44Z 2014-10-08T22:00:54Z

The Java ecosystem is the leading enterprise software development platform, with widespread industry support and deployment on platforms like the IBM WebSphere...]]>

The Java ecosystem is the leading enterprise software development platform, with widespread industry support and deployment on platforms like the IBM WebSphere...

The Java ecosystem is the leading enterprise software development platform, with widespread industry support and deployment on platforms like the IBM WebSphere Application Server product family. Java provides a powerful object-oriented programming language with a large developer ecosystem and developer-friendly features like automated memory management, program safety��

]]> 15 Tony Scudiero <![CDATA[Separate Compilation and Linking of CUDA C++ Device Code]]> http://www.open-lab.net/blog/parallelforall/?p=2522 2022-08-21T23:37:02Z 2014-04-22T15:03:27Z

Managing complexity in?large programs requires breaking them down into components that are responsible for small, well-defined portions of the overall program....]]>

Managing complexity in?large programs requires breaking them down into components that are responsible for small, well-defined portions of the overall program....

CUDA_Cube_1K

]]> 39 Mark Harris <![CDATA[CUDA Pro Tip: Understand Fat Binaries and JIT Caching]]> http://www.parallelforall.com/?p=1531 2022-08-21T23:36:54Z 2013-06-05T00:41:31Z

As NVIDIA GPUs evolve to support new features, the instruction set architecture naturally changes. Because applications must?run on multiple generations of...]]>

As NVIDIA GPUs evolve to support new features, the instruction set architecture naturally changes. Because applications must?run on multiple generations of... GPU Pro Tip

GPU Pro Tip

As NVIDIA GPUs evolve to support new features, the instruction set architecture naturally changes. Because applications must run on multiple generations of GPUs, the NVIDIA compiler tool chain supports compiling for multiple architectures in the same application executable or library. CUDA also relies on the PTX virtual GPU ISA to provide forward compatibility, so that already deployed��

]]> 1 ��˳��97caoporen��