Arthy Sundaram – NVIDIA Technical Blog

Arthy Sundaram – NVIDIA Technical Blog News and tutorials for developers, data scientists, and IT admins 2024-08-28T17:43:25Z http://www.open-lab.net/blog/feed/ Arthy Sundaram <![CDATA[Building High-Performance Applications in the Era of Accelerated Computing]]> http://www.open-lab.net/blog/?p=80067 2024-08-28T17:32:20Z 2024-03-25T16:00:00Z

AI is augmenting high-performance computing (HPC) with novel approaches to data processing, simulation, and modeling. Because of the computational requirements...]]>

AI is augmenting high-performance computing (HPC) with novel approaches to data processing, simulation, and modeling. Because of the computational requirements of these new AI workloads, HPC is scaling up at a rapid pace. To enable applications to scale to multi-GPU and multi-node platforms, HPC tools and libraries must support that growth. NVIDIA provides a comprehensive ecosystem of…

]]> Arthy Sundaram <![CDATA[cuTENSOR 2.0: Applications and Performance]]> http://www.open-lab.net/blog/?p=77915 2024-04-09T23:45:28Z 2024-03-09T03:20:47Z

While part 1 focused on the usage of the new NVIDIA cuTENSOR 2.0 CUDA math library, this post introduces a variety of usage modes beyond that, specifically...]]>

While part 1 focused on the usage of the new NVIDIA cuTENSOR 2.0 CUDA math library, this post introduces a variety of usage modes beyond that, specifically usage from Python and Julia. We also demonstrate the performance of cuTENSOR based on benchmarks in a number of application domains. This post explores applications and performance benchmarks for cuTENSOR 2.0. For more information…

]]> Arthy Sundaram <![CDATA[cuTENSOR 2.0: A Comprehensive Guide for Accelerating Tensor Computations]]> http://www.open-lab.net/blog/?p=77913 2024-04-09T23:45:29Z 2024-03-09T03:20:45Z

NVIDIA cuTENSOR is a CUDA math library that provides optimized implementations of tensor operations where tensors are dense, multi-dimensional arrays or array...]]>

NVIDIA cuTENSOR is a CUDA math library that provides optimized implementations of tensor operations where tensors are dense, multi-dimensional arrays or array slices. The release of cuTENSOR 2.0 represents a major update—in both functionality and performance—over its predecessor. This version reimagines its APIs to be more expressive, including advanced just-in-time compilation capabilities all…

]]> Arthy Sundaram <![CDATA[Spotlight: Honeywell Accelerates Industrial Process Simulation with NVIDIA cuDSS]]> http://www.open-lab.net/blog/?p=78496 2024-04-09T23:45:36Z 2024-03-05T19:00:00Z

For over a decade, traditional industrial process modeling and simulation approaches have struggled to fully leverage multicore CPUs or acceleration devices to...]]>

For over a decade, traditional industrial process modeling and simulation approaches have struggled to fully leverage multicore CPUs or acceleration devices to run simulation and optimization calculations in parallel. Multicore linear solvers used in process modeling and simulation have not achieved expected improvements, and in certain cases have underperformed optimized single-core solvers.

]]> Arthy Sundaram <![CDATA[Unlock the Power of NVIDIA Grace and NVIDIA Hopper Architectures with Foundational HPC Software]]> http://www.open-lab.net/blog/?p=72977 2024-08-28T17:33:20Z 2023-11-16T19:07:51Z

High-performance computing (HPC) powers applications in simulation and modeling, healthcare and life sciences, industry and engineering, and more. In the modern...]]>

High-performance computing (HPC) powers applications in simulation and modeling, healthcare and life sciences, industry and engineering, and more. In the modern data center, HPC synergizes with AI, harnessing data in transformative new ways. The performance and throughput demands of next-generation HPC applications call for an accelerated computing platform that can handle diverse workloads…

]]> 0 Arthy Sundaram <![CDATA[CUDA Toolkit 12.2 Unleashes Powerful Features for Boosting Applications]]> http://www.open-lab.net/blog/?p=67705 2024-08-28T17:39:00Z 2023-07-06T19:16:56Z

The latest release of CUDA Toolkit 12.2 introduces a range of essential new features, modifications to the programming model, and enhanced support for hardware...]]>

The latest release of CUDA Toolkit 12.2 introduces a range of essential new features, modifications to the programming model, and enhanced support for hardware capabilities accelerating CUDA applications. Now out through general availability from NVIDIA, CUDA Toolkit 12.2 includes many new capabilities, both major and minor. The following post offers an overview of many of the key…

]]> 0 Arthy Sundaram <![CDATA[CUDA 12.0 Compiler Support for Runtime LTO Using nvJitLink Library]]> http://www.open-lab.net/blog/?p=59762 2023-06-12T08:12:19Z 2023-01-17T22:40:43Z

CUDA Toolkit 12.0 introduces a new nvJitLink library for Just-in-Time Link Time Optimization (JIT LTO) support. In the early days of CUDA, to get maximum...]]>

CUDA Toolkit 12.0 introduces a new nvJitLink library for Just-in-Time Link Time Optimization (JIT LTO) support. In the early days of CUDA, to get maximum performance, developers had to build and compile CUDA kernels as a single source file in whole programming mode. This limited SDKs and applications with large swaths of code, spanning multiple files that required separate compilation from porting…

]]> 6 Arthy Sundaram <![CDATA[CUDA Toolkit 12.0 Released for General Availability]]> http://www.open-lab.net/blog/?p=58508 2024-08-28T17:43:25Z 2022-12-12T19:00:00Z

NVIDIA announces the newest CUDA Toolkit software release, 12.0. This release is the first major release in many years and it focuses on new programming models...]]>

NVIDIA announces the newest CUDA Toolkit software release, 12.0. This release is the first major release in many years and it focuses on new programming models and CUDA application acceleration through new hardware capabilities. For more information, watch the YouTube Premiere webinar, CUDA 12.0: New Features and Beyond. You can now target architecture-specific features and instructions…

]]> 0 Arthy Sundaram <![CDATA[CUDA 11.6 Toolkit New Release Revealed]]> http://www.open-lab.net/blog/?p=43096 2022-08-21T23:53:18Z 2022-01-17T08:15:31Z

NVIDIA announces the newest release of the CUDA development environment, CUDA 11.6. This release is focused on enhancing the programming model and performance...]]>

NVIDIA announces the newest release of the CUDA development environment, CUDA 11.6. This release is focused on enhancing the programming model and performance of your CUDA applications. CUDA continues to push the boundaries of GPU acceleration and lay the foundation for new applications in HPC, visualization, AI, ML and DL, and data science. CUDA 11.6 has several important features.

]]> 0 Arthy Sundaram <![CDATA[Reducing Application Build Times Using CUDA C++ Compilation Aids]]> http://www.open-lab.net/blog/?p=38989 2022-08-21T23:52:55Z 2021-10-26T05:04:15Z

The CUDA 11.5 C++ compiler addresses a growing customer request. Specifically, how to reduce CUDA application build times. Along with eliminating unused...]]>

]]> 1 Arthy Sundaram <![CDATA[Revealing New Features in the CUDA 11.5 Toolkit]]> http://www.open-lab.net/blog/?p=38780 2022-08-21T23:52:54Z 2021-10-26T05:01:51Z

NVIDIA announces the newest release of the CUDA development environment, CUDA 11.5. CUDA 11.5 is focused on enhancing the programming model and performance of...]]>

NVIDIA announces the newest release of the CUDA development environment, CUDA 11.5. CUDA 11.5 is focused on enhancing the programming model and performance of your CUDA applications. CUDA continues to push the boundaries of GPU acceleration and lay the foundation for new applications in HPC, visualization, AI, ML and DL, and data sciences. CUDA 11.5 has several important features.

]]> 0 Arthy Sundaram <![CDATA[Leveling up CUDA Performance on WSL2 with New Enhancements]]> http://www.open-lab.net/blog/?p=35836 2022-08-21T23:52:26Z 2021-08-10T23:53:26Z

[stextbox id="info"]WSL2 is available on Windows 11 outside the Windows Insider Preview. For more information about what is supported, see the CUDA on WSL User...]]>

WSL2 is available on Windows 11 outside the Windows Insider Preview. For more information about what is supported, see the CUDA on WSL User Guide. In June 2020, we released the first NVIDIA Display Driver that enabled GPU acceleration in the Windows Subsystem for Linux (WSL) 2 for Windows Insider Program (WIP) Preview users. At that time, it was still an early preview with a limited set of…

]]> 2 Arthy Sundaram <![CDATA[Programming Efficiently with the NVIDIA CUDA 11.3 Compiler Toolchain]]> http://www.open-lab.net/blog/?p=29901 2023-12-30T00:42:34Z 2021-04-16T00:40:00Z

The CUDA 11.3 release of the CUDA C++ compiler toolchain incorporates new features aimed at improving developer productivity and code performance. NVIDIA is...]]>

The CUDA 11.3 release of the CUDA C++ compiler toolchain incorporates new features aimed at improving developer productivity and code performance. NVIDIA is introducing cu++flt, a standalone demangler tool that allows you to decode mangled function names to aid source code correlation. Starting with this release, the NVRTC shared library versioning scheme is relaxed to facilitate compatible…

]]> 2 Arthy Sundaram <![CDATA[Boosting Productivity and Performance with the NVIDIA CUDA 11.2 C++ Compiler]]> http://www.open-lab.net/blog/?p=23916 2022-08-21T23:41:02Z 2021-02-13T02:30:28Z

The 11.2 CUDA C++ compiler incorporates features and enhancements aimed at improving developer productivity and the performance of GPU-accelerated applications....]]>

The 11.2 CUDA C++ compiler incorporates features and enhancements aimed at improving developer productivity and the performance of GPU-accelerated applications. The compiler toolchain gets an LLVM upgrade to 7.0, which enables new features and can help improve compiler code generation for NVIDIA GPUs. Link-time optimization (LTO) for device code (also known as device LTO)…

]]> 0 Arthy Sundaram <![CDATA[Improving GPU Application Performance with NVIDIA CUDA 11.2 Device Link Time Optimization]]> http://www.open-lab.net/blog/?p=23930 2022-08-21T23:41:02Z 2021-02-13T01:27:00Z

CUDA 11.2 features the powerful link time optimization (LTO) feature for device code in GPU-accelerated applications. Device LTO brings the performance...]]>

CUDA 11.2 features the powerful link time optimization (LTO) feature for device code in GPU-accelerated applications. Device LTO brings the performance advantages of device code optimization that were only possible in the whole program compilation mode to the separate compilation mode, which was introduced in CUDA 5.0. Separate compilation mode allows CUDA device kernel code to span across…

]]> 16 ��˳��97caoporen��