Vijay Thakkar

Vijay Thakkar is a senior compute architect at NVIDIA and the primary author of CUTLASS 3. In addition to his work on CUTLASS, he is involved in the development of Tensor Core architecture, PTX exposure, and programming model across the GPU architecture, compiler, and CUDA engineering teams.

Posts by Vijay Thakkar

Models / Libraries / Frameworks Jul 16, 2025

CUTLASS 3.x: Orthogonal, Reusable, and Composable Abstractions for GEMM Kernel Design

GEMM optimization on GPUs is a modular problem. Performant implementations need to specify hyperparameters such as tile shapes, math and copy instructions, and... 12 MIN READ

Models / Libraries / Frameworks Jul 16, 2025

CUTLASS: Principled Abstractions for Handling Multidimensional Data Through Tensors and Spatial Microkernels

In the era of generative AI, utilizing GPUs to their maximum potential is essential to training better models and serving users at scale. Often, these models... 12 MIN READ

Generative AI Jul 11, 2024

Next Generation of FlashAttention

NVIDIA is excited to collaborate with Colfax, Together.ai, Meta, and Princeton University on their recent achievement to exploit the Hopper GPU architecture and... 1 MIN READ