Posts by Vinod Grover
Development & Optimization
Jun 13, 2025
Run High-Performance LLM Inference Kernels from NVIDIA Using FlashInfer??
Best-in-class LLM Inference requires two key elements: speed and developer velocity. Speed refers to maximizing the efficiency of the underlying hardware by...
6 MIN READ
Simulation / Modeling / Design
Jan 15, 2018
Using CUDA Warp-Level Primitives
NVIDIA GPUs execute groups of threads known as warps?in SIMT (Single Instruction, Multiple Thread) fashion. Many CUDA programs achieve high performance by...
16 MIN READ
Simulation / Modeling / Design
Nov 07, 2016
New Compiler Features in CUDA 8
CUDA 8 is one of the most significant updates in the history of the CUDA platform. In addition to Unified Memory and the many new API and library features in...
17 MIN READ