Zihao Ye

Zihao Ye is a senior compiler engineer at NVIDIA and a PhD student at the University of Washington. His research interests include efficient LLM inference and machine learning compilers.

Posts by Zihao Ye

Development & Optimization Jun 13, 2025

Run High-Performance LLM Inference Kernels from NVIDIA Using FlashInfer??

Best-in-class LLM Inference requires two key elements: speed and developer velocity. Speed refers to maximizing the efficiency of the underlying hardware by... 6 MIN READ