AI is augmenting high-performance computing (HPC) with novel approaches to data processing, simulation, and modeling. Because of the computational requirements of these new AI workloads, HPC is scaling up at a rapid pace. To enable applications to scale to multi-GPU and multi-node platforms, HPC tools and libraries must support that growth. NVIDIA provides a comprehensive ecosystem of…
]]>While part 1 focused on the usage of the new NVIDIA cuTENSOR 2.0 CUDA math library, this post introduces a variety of usage modes beyond that, specifically usage from Python and Julia. We also demonstrate the performance of cuTENSOR based on benchmarks in a number of application domains. This post explores applications and performance benchmarks for cuTENSOR 2.0. For more information…
]]>NVIDIA cuTENSOR is a CUDA math library that provides optimized implementations of tensor operations where tensors are dense, multi-dimensional arrays or array slices. The release of cuTENSOR 2.0 represents a major update—in both functionality and performance—over its predecessor. This version reimagines its APIs to be more expressive, including advanced just-in-time compilation capabilities all…
]]>For over a decade, traditional industrial process modeling and simulation approaches have struggled to fully leverage multicore CPUs or acceleration devices to run simulation and optimization calculations in parallel. Multicore linear solvers used in process modeling and simulation have not achieved expected improvements, and in certain cases have underperformed optimized single-core solvers.
]]>High-performance computing (HPC) powers applications in simulation and modeling, healthcare and life sciences, industry and engineering, and more. In the modern data center, HPC synergizes with AI, harnessing data in transformative new ways. The performance and throughput demands of next-generation HPC applications call for an accelerated computing platform that can handle diverse workloads…
]]>The latest release of CUDA Toolkit 12.2 introduces a range of essential new features, modifications to the programming model, and enhanced support for hardware capabilities accelerating CUDA applications. Now out through general availability from NVIDIA, CUDA Toolkit 12.2 includes many new capabilities, both major and minor. The following post offers an overview of many of the key…
]]>CUDA Toolkit 12.0 introduces a new nvJitLink library for Just-in-Time Link Time Optimization (JIT LTO) support. In the early days of CUDA, to get maximum performance, developers had to build and compile CUDA kernels as a single source file in whole programming mode. This limited SDKs and applications with large swaths of code, spanning multiple files that required separate compilation from porting…
]]>NVIDIA announces the newest CUDA Toolkit software release, 12.0. This release is the first major release in many years and it focuses on new programming models and CUDA application acceleration through new hardware capabilities. For more information, watch the YouTube Premiere webinar, CUDA 12.0: New Features and Beyond. You can now target architecture-specific features and instructions…
]]>NVIDIA announces the newest release of the CUDA development environment, CUDA 11.6. This release is focused on enhancing the programming model and performance of your CUDA applications. CUDA continues to push the boundaries of GPU acceleration and lay the foundation for new applications in HPC, visualization, AI, ML and DL, and data science. CUDA 11.6 has several important features.
]]>NVIDIA announces the newest release of the CUDA development environment, CUDA 11.5. CUDA 11.5 is focused on enhancing the programming model and performance of your CUDA applications. CUDA continues to push the boundaries of GPU acceleration and lay the foundation for new applications in HPC, visualization, AI, ML and DL, and data sciences. CUDA 11.5 has several important features.
]]>WSL2 is available on Windows 11 outside the Windows Insider Preview. For more information about what is supported, see the CUDA on WSL User Guide. In June 2020, we released the first NVIDIA Display Driver that enabled GPU acceleration in the Windows Subsystem for Linux (WSL) 2 for Windows Insider Program (WIP) Preview users. At that time, it was still an early preview with a limited set of…
]]>The CUDA 11.3 release of the CUDA C++ compiler toolchain incorporates new features aimed at improving developer productivity and code performance. NVIDIA is introducing cu++flt, a standalone demangler tool that allows you to decode mangled function names to aid source code correlation. Starting with this release, the NVRTC shared library versioning scheme is relaxed to facilitate compatible…
]]>The 11.2 CUDA C++ compiler incorporates features and enhancements aimed at improving developer productivity and the performance of GPU-accelerated applications. The compiler toolchain gets an LLVM upgrade to 7.0, which enables new features and can help improve compiler code generation for NVIDIA GPUs. Link-time optimization (LTO) for device code (also known as device LTO)…
]]>CUDA 11.2 features the powerful link time optimization (LTO) feature for device code in GPU-accelerated applications. Device LTO brings the performance advantages of device code optimization that were only possible in the whole program compilation mode to the separate compilation mode, which was introduced in CUDA 5.0. Separate compilation mode allows CUDA device kernel code to span across…
]]>