Haohang Huang

Haohang Huang is a senior AI developer technology engineer at NVIDIA. He works on accelerating GenAI applications on GPUs, with the focus on computer vision and large language models. He received his Ph.D. from University of Illinois Urbana-Champaign.
Avatar photo

Posts by Haohang Huang

Generative AI

NVIDIA TensorRT-LLM Now Supports Recurrent Drafting for Optimizing LLM Inference

Recurrent drafting (referred to as ReDrafter) is a novel speculative decoding technique developed and open-sourced by Apple for large language model (LLM)... 6 MIN READ
Chat avatar between tiles with computer activity icons, on a black background.
Generative AI

NVIDIA TensorRT-LLM Now Accelerates Encoder-Decoder Models with In-Flight Batching

NVIDIA recently announced that NVIDIA TensorRT-LLM now accelerates encoder-decoder model architectures. TensorRT-LLM is an open-source library that optimizes... 4 MIN READ