CUTLASS

Jul 16, 2025
CUTLASS 3.x: Orthogonal, Reusable, and Composable Abstractions for GEMM Kernel Design
GEMM optimization on GPUs is a modular problem. Performant implementations need to specify hyperparameters such as tile shapes, math and copy instructions, and...
12 MIN READ

Jul 16, 2025
CUTLASS: Principled Abstractions for Handling Multidimensional Data Through Tensors and Spatial Microkernels
In the era of generative AI, utilizing GPUs to their maximum potential is essential to training better models and serving users at scale. Often, these models...
12 MIN READ

Jul 11, 2024
Next Generation of FlashAttention
NVIDIA is excited to collaborate with Colfax, Together.ai, Meta, and Princeton University on their recent achievement to exploit the Hopper GPU architecture and...
1 MIN READ