Zihao Ye

Zihao Ye is a senior compiler engineer at NVIDIA and a PhD student at the University of Washington. His research interests include efficient LLM inference and machine learning compilers.
Avatar photo

Posts by Zihao Ye

Decorative image.
Development & Optimization

Run High-Performance LLM Inference Kernels from NVIDIA Using FlashInfer??

Best-in-class LLM Inference requires two key elements: speed and developer velocity. Speed refers to maximizing the efficiency of the underlying hardware by... 6 MIN READ