Haohang Huang

Haohang Huang is a senior AI developer technology engineer at NVIDIA. He works on accelerating GenAI applications on GPUs, with the focus on computer vision and large language models. He received his Ph.D. from University of Illinois Urbana-Champaign.

Posts by Haohang Huang

Generative AI Dec 18, 2024

NVIDIA TensorRT-LLM Now Supports Recurrent Drafting for Optimizing LLM Inference

Recurrent drafting (referred to as ReDrafter) is a novel speculative decoding technique developed and open-sourced by Apple for large language model (LLM)... 6 MIN READ

Generative AI Dec 11, 2024

NVIDIA TensorRT-LLM Now Accelerates Encoder-Decoder Models with In-Flight Batching

NVIDIA recently announced that NVIDIA TensorRT-LLM now accelerates encoder-decoder model architectures. TensorRT-LLM is an open-source library that optimizes... 4 MIN READ