CUTLASS – NVIDIA Technical Blog News and tutorials for developers, data scientists, and IT admins 2025-07-26T02:39:18Z http://www.open-lab.net/blog/feed/ Cris Cecka <![CDATA[CUTLASS 3.x: Orthogonal, Reusable, and Composable Abstractions for GEMM Kernel Design]]> http://www.open-lab.net/blog/?p=103394 2025-07-24T18:32:24Z 2025-07-16T15:45:00Z GEMM optimization on GPUs is a modular problem. Performant implementations need to specify hyperparameters such as tile shapes, math and copy instructions, and...]]> GEMM optimization on GPUs is a modular problem. Performant implementations need to specify hyperparameters such as tile shapes, math and copy instructions, and...

Source

]]>
0
Vijay Thakkar <![CDATA[CUTLASS: Principled Abstractions for Handling Multidimensional Data Through Tensors and Spatial Microkernels]]> http://www.open-lab.net/blog/?p=103359 2025-07-24T18:32:25Z 2025-07-16T15:30:00Z In the era of generative AI, utilizing GPUs to their maximum potential is essential to training better models and serving users at scale. Often, these models...]]> In the era of generative AI, utilizing GPUs to their maximum potential is essential to training better models and serving users at scale. Often, these models...

In the era of generative AI, utilizing GPUs to their maximum potential is essential to training better models and serving users at scale. Often, these models have layers that cannot be expressed as off-the-shelf library operations due to subtle modifications, and DL compilers typically forgo the last few percentage points of optimizations to make their deployment feasible.

Source

]]>
0
Vijay Thakkar <![CDATA[Next Generation of FlashAttention]]> http://www.open-lab.net/blog/?p=85219 2024-07-25T18:19:05Z 2024-07-11T17:46:06Z NVIDIA is excited to collaborate with Colfax, Together.ai, Meta, and Princeton University on their recent achievement to exploit the Hopper GPU architecture and...]]> NVIDIA is excited to collaborate with Colfax, Together.ai, Meta, and Princeton University on their recent achievement to exploit the Hopper GPU architecture and...

NVIDIA is excited to collaborate with Colfax, Together.ai, Meta, and Princeton University on their recent achievement to exploit the Hopper GPU architecture and Tensor Cores and accelerate key Fused Attention kernels using CUTLASS 3. FlashAttention-3 incorporates key techniques to achieve 1.5�C2.0x faster performance than FlashAttention-2 with FP16, up to 740 TFLOPS. With FP8��

Source

]]>
0
���˳���97caoporen����