Matthew Nicely – NVIDIA Technical Blog News and tutorials for developers, data scientists, and IT admins 2025-02-06T19:33:52Z http://www.open-lab.net/blog/feed/ Matthew Nicely <![CDATA[Just Released: CUTLASS 3.8]]> http://www.open-lab.net/blog/?p=95716 2025-02-06T19:33:50Z 2025-02-03T23:54:16Z Provides support for the NVIDIA Blackwell SM100 architecture. CUTLASS is a collection of CUDA C++ templates and abstractions for implementing high-performance...]]>

Provides support for the NVIDIA Blackwell SM100 architecture. CUTLASS is a collection of CUDA C++ templates and abstractions for implementing high-performance GEMM computations.

Source

]]>
Matthew Nicely <![CDATA[Just Released: NVIDIA cuDNN 9.7]]> http://www.open-lab.net/blog/?p=95670 2025-02-06T19:33:52Z 2025-01-31T21:23:42Z Bringing support for NVIDIA Blackwell architecture across data center and GeForce products, NVIDIA cuDNN 9.7 delivers speedups of up to 84% for FP8 Flash...]]>

Bringing support for NVIDIA Blackwell architecture across data center and GeForce products, NVIDIA cuDNN 9.7 delivers speedups of up to 84% for FP8 Flash Attention operations and optimized GEMM capabilities with advanced fusion support to accelerate deep learning workloads.

Source

]]>
Matthew Nicely <![CDATA[Accelerating Transformers with NVIDIA cuDNN 9]]> http://www.open-lab.net/blog/?p=82592 2024-05-30T19:55:46Z 2024-05-24T16:00:00Z The NVIDIA CUDA Deep Neural Network library (cuDNN) is a GPU-accelerated library for accelerating deep learning primitives with state-of-the-art performance....]]>

The NVIDIA CUDA Deep Neural Network library (cuDNN) is a GPU-accelerated library for accelerating deep learning primitives with state-of-the-art performance. cuDNN is integrated with popular deep learning frameworks like PyTorch, TensorFlow, and XLA (Accelerated Linear Algebra). These frameworks abstract the complexities of direct GPU programming, enabling you to focus on designing and…

Source

]]>
1
Matthew Nicely <![CDATA[CUDA Toolkit 12.0 Released for General Availability]]> http://www.open-lab.net/blog/?p=58508 2024-08-28T17:43:25Z 2022-12-12T19:00:00Z NVIDIA announces the newest CUDA Toolkit software release, 12.0. This release is the first major release in many years and it focuses on new programming models...]]>

NVIDIA announces the newest CUDA Toolkit software release, 12.0. This release is the first major release in many years and it focuses on new programming models and CUDA application acceleration through new hardware capabilities. For more information, watch the YouTube Premiere webinar, CUDA 12.0: New Features and Beyond. You can now target architecture-specific features and instructions…

Source

]]>
0
Matthew Nicely <![CDATA[Just Released: cuTENSOR V1.5]]> http://www.open-lab.net/blog/?p=49386 2023-06-12T09:27:18Z 2022-06-21T20:27:24Z ]]> ]]> 0 Matthew Nicely <![CDATA[Just Released: cuSPARSELt v0.3]]> http://www.open-lab.net/blog/?p=49323 2023-06-12T09:28:20Z 2022-06-20T15:00:00Z ]]> ]]> 0 Matthew Nicely <![CDATA[Programming Distributed Multi-GPU Tensor Operations with cuTENSOR v1.4]]> http://www.open-lab.net/blog/?p=40104 2023-05-22T19:55:40Z 2021-11-29T14:24:35Z Today, NVIDIA is announcing the availability of cuTENSOR, version 1.4, which supports up to 64-dimensional tensors, distributed multi-GPU tensor...]]>

Today, NVIDIA is announcing the availability of cuTENSOR, version 1.4, which supports up to 64-dimensional tensors, distributed multi-GPU tensor operations, and helps improve tensor contraction performance models. This software can be downloaded now free of charge. Download the cuTENSOR software. For more information, see the cuTENSOR Release Notes. cuTENSOR is a high…

Source

]]>
0
Matthew Nicely <![CDATA[Implementing High Performance Matrix Multiplication Using CUTLASS v2.8]]> http://www.open-lab.net/blog/?p=41581 2023-05-22T19:56:01Z 2021-11-23T14:35:39Z NVIDIA continues to enhance CUTLASS to provide extensive support for mixed-precision computations, providing specialized data-movement, and multiply-accumulate...]]>

NVIDIA continues to enhance CUTLASS to provide extensive support for mixed-precision computations, providing specialized data-movement, and multiply-accumulate abstractions. Today, NVIDIA is announcing the availability of CUTLASS version 2.8. Download the free CUTLASS v2.8 software. See the CUTLASS Release Notes for more information. CUTLASS is a collection of CUDA…

Source

]]>
0
Matthew Nicely <![CDATA[Using Fully Redesigned Batch API and Performance Optimizations in nvCOMP v2.1.0]]> http://www.open-lab.net/blog/?p=40826 2022-08-21T23:53:05Z 2021-11-15T23:30:00Z Today, NVIDIA is announcing the availability of nvCOMP, version 2.1.0. This software can be downloaded now free of charge. Download now. What's New?...]]>

Today, NVIDIA is announcing the availability of nvCOMP, version 2.1.0. This software can be downloaded now free of charge. Download now. See the nvCOMP Release Notes for more information. nvCOMP is a CUDA library that features generic compression interfaces to enable developers to use high-performance GPU compressors in their applications.

Source

]]>
0
Matthew Nicely <![CDATA[Accelerating ReLu and GeLu Activation Functions, and Batched Sparse GEMM in cuSPARSELt v0.2.0]]> http://www.open-lab.net/blog/?p=39713 2023-06-12T21:07:28Z 2021-11-15T23:30:00Z Today, NVIDIA is announcing the availability of cuSPARSELt, version 0.2.0, which increases performance on activation functions, bias vectors, and Batched Sparse...]]>

Today, NVIDIA is announcing the availability of cuSPARSELt, version 0.2.0, which increases performance on activation functions, bias vectors, and Batched Sparse GEMM. This software can be downloaded now free of charge. Download the cuSPARSELt software. For more technical information, see the cuSPARSELt Release Notes. NVIDIA cuSPARSELt is a high-performance CUDA…

Source

]]>
0
Matthew Nicely <![CDATA[cuSOLVERMp v0.0.1 Now Available: Through Early Access]]> http://www.open-lab.net/blog/?p=31511 2023-06-12T21:10:50Z 2021-05-10T19:15:30Z Today, cuSOLVERMp version 0.0.1 is now available at no charge for members of the NVIDIA Developer Program. Download Now What��s New Support for LU solver, with...]]>

Today, cuSOLVERMp version 0.0.1 is now available at no charge for members of the NVIDIA Developer Program. Download Now What’s New About cuSOLVERMp cuSOLVERMp provides a distributed-memory multi-node and multi-GPU solution for solving systems of linear equations at scale! In the future, it will also solve eigenvalue and singular value problems.

Source

]]>
0
Matthew Nicely <![CDATA[nvCOMP v2.0.0 Now Available: With New Compressors]]> http://www.open-lab.net/blog/?p=31094 2022-08-21T23:51:32Z 2021-04-30T15:00:00Z Today, NVIDIA is announcing the availability of nvCOMP version 2.0.0. This software can be downloaded now free for members of the NVIDIA Developer Program....]]>

Today, NVIDIA is announcing the availability of nvCOMP version 2.0.0. This software can be downloaded now free for members of the NVIDIA Developer Program. Download Now What’s New See the nvCOMP Release Notes for more information About nvCOMP nvCOMP is a CUDA library that features generic compression interfaces to enable developers to use high-performance GPU…

Source

]]>
0
Matthew Nicely <![CDATA[HPL-AI Now Runs 2x Faster on NVIDIA DGX A100]]> http://www.open-lab.net/blog/?p=31088 2022-08-21T23:51:31Z 2021-04-28T16:07:34Z NVIDIA announced its latest update to the HPL-AI Benchmark version 2.0.0, which will reside in the HPC-Benchmarks container version 21.4. The HPL-AI (High...]]>

NVIDIA announced its latest update to the HPL-AI Benchmark version 2.0.0, which will reside in the HPC-Benchmarks container version 21.4. The HPL-AI (High Performance Linpack – Artificial Intelligence) benchmark helps evaluate the convergence of HPC and data-driven AI workloads. Historically, HPC workloads are benchmarked at double-precision, representing the accuracy requirements in…

Source

]]>
0
Matthew Nicely <![CDATA[cuTENSOR v1.3.0 Now Available: Up to 2x Performance]]> http://www.open-lab.net/blog/?p=31091 2022-08-21T23:51:32Z 2021-04-28T16:00:00Z Today, NVIDIA is announcing the availability of cuTENSOR version 1.3.0. This software can be downloaded now free for members of the NVIDIA Developer Program....]]>

Today, NVIDIA is announcing the availability of cuTENSOR version 1.3.0. This software can be downloaded now free for members of the NVIDIA Developer Program. Download Now What’s New See the cuTENSOR Release Notes for more information. About cuTENSOR cuTENSOR is a high-performance CUDA library for tensor primitives; its key features are: Learn more…

Source

]]>
0
Matthew Nicely <![CDATA[cuSPARSELt v0.1.0 Now Available: Arm and Windows Support]]> http://www.open-lab.net/blog/?p=31048 2022-08-21T23:51:31Z 2021-04-23T22:57:01Z Today, NVIDIA is announcing the availability of cuSPARSELt version 0.1.0. This software can be downloaded now free for members of the NVIDIA Developer Program....]]>

Today, NVIDIA is announcing the availability of cuSPARSELt version 0.1.0. This software can be downloaded now free for members of the NVIDIA Developer Program. Download Now What’s New See the cuSPARSELt Release Notes for more information About cuSPARSELt NVIDIA cuSPARSELt is a high-performance CUDA library dedicated to general matrix-matrix operations in which at least one operand is a…

Source

]]>
0
Matthew Nicely <![CDATA[Unifying the CUDA Python Ecosystem]]> http://www.open-lab.net/blog/?p=30340 2023-03-22T01:11:58Z 2021-04-12T19:00:00Z Python plays a key role within the science, engineering, data analytics, and deep learning application ecosystem. NVIDIA has long been committed to helping the...]]>

Python plays a key role within the science, engineering, data analytics, and deep learning application ecosystem. NVIDIA has long been committed to helping the Python ecosystem leverage the accelerated massively parallel performance of GPUs to deliver standardized libraries, tools, and applications. Today, we’re introducing another step towards simplification of the developer experience with…

Source

]]>
11
���˳���97caoporen����