Disha Mehra – NVIDIA Technical Blog News and tutorials for developers, data scientists, and IT admins 2025-03-11T01:44:00Z http://www.open-lab.net/blog/feed/ Disha Mehra <![CDATA[NVIDIA TensorRT-LLM Now Supports Recurrent Drafting for Optimizing LLM Inference]]> http://www.open-lab.net/blog/?p=92963 2025-03-11T01:44:00Z 2024-12-18T17:31:01Z Recurrent drafting (referred to as ReDrafter) is a novel speculative decoding technique developed and open-sourced by Apple for large language model (LLM)...]]>

Recurrent drafting (referred to as ReDrafter) is a novel speculative decoding technique developed and open-sourced by Apple for large language model (LLM) inference now available with NVIDIA TensorRT-LLM. ReDrafter helps developers significantly boost LLM workload performance on NVIDIA GPUs. NVIDIA TensorRT-LLM is a library for optimizing LLM inference. It provides an easy-to-use Python API to…

Source

]]>
Disha Mehra <![CDATA[Building and Deploying Conversational AI Models Using NVIDIA TAO Toolkit]]> http://www.open-lab.net/blog/?p=24079 2023-03-22T01:16:50Z 2021-11-09T16:15:24Z Sign up for the latest Speech AI news from NVIDIA. Conversational AI is a set of technologies enabling human-like interactions between humans and devices based...]]>

Sign up for the latest Speech AI news from NVIDIA. Conversational AI is a set of technologies enabling human-like interactions between humans and devices based on the most natural interfaces for us: speech and natural language. Systems based on conversational AI can understand commands by recognizing speech and text, translating on-the-fly between different languages…

Source

]]>
2
���˳���97caoporen����