In today’s data-driven world, the ability to retrieve accurate information from even modest amounts of data is vital for developers seeking streamlined, effective solutions for quick deployments, prototyping, or experimentation. One of the key challenges in information retrieval is managing the diverse modalities in unstructured datasets, including text, PDFs, images, tables, audio, video…
]]>As of March 18, 2025, NVIDIA Triton Inference Server is now part of the NVIDIA Dynamo Platform and has been renamed to NVIDIA Dynamo Triton, accordingly. Large language models (LLMs) have been widely used for chatbots, content generation, summarization, classification, translation, and more. State-of-the-art LLMs and foundation models, such as Llama, Gemma, GPT, and Nemotron…
]]>Speech AI applications, from call centers to virtual assistants, rely heavily on automatic speech recognition (ASR) and text-to-speech (TTS). ASR can process the audio signal and transcribe the audio to text. Speech synthesis or TTS can generate high-quality, natural-sounding audio from the text in real time. The challenge of Speech AI is to achieve high accuracy and meet the latency requirements…
]]>Multi-Instance GPU (MIG) is an important feature of NVIDIA H100, A100, and A30 Tensor Core GPUs, as it can partition a GPU into multiple instances. Each instance has its own compute cores, high-bandwidth memory, L2 cache, DRAM bandwidth, and media engines such as decoders. This enables multiple workloads or multiple users to run workloads simultaneously on one GPU to maximize the GPU…
]]>NVIDIA A30 GPU is built on the latest NVIDIA Ampere Architecture to accelerate diverse workloads like AI inference at scale, enterprise training, and HPC applications for mainstream servers in data centers. The A30 PCIe card combines the third-generation Tensor Cores with large HBM2 memory (24 GB) and fast GPU memory bandwidth (933 GB/s) in a low-power envelope (maximum 165 W).
]]>Join the NVIDIA Triton and NVIDIA TensorRT community to stay current on the latest product updates, bug fixes, content, best practices, and more. As of March 18, 2025, NVIDIA Triton Inference Server is now part of the NVIDIA Dynamo Platform and has been renamed to NVIDIA Dynamo Triton, accordingly. NVIDIA Triton Inference Server is an open-source AI model serving software that…
]]>With the third-generation Tensor Core technology, NVIDIA recently unveiled A100 Tensor Core GPU that delivers unprecedented acceleration at every scale for AI, data analytics, and high-performance computing. Along with the great performance increase over prior generation GPUs comes another groundbreaking innovation, Multi-Instance GPU (MIG). With MIG, each A100 GPU can be partitioned up to seven…
]]>Recent conversational AI research has demonstrated automatically generating high quality, human-like audio from text. For example, you can use Tacotron 2 and WaveGlow to convert text into high quality, natural-sounding speech in real time. You can also use FastPitch to generate mel spectrograms in parallel, achieving good speedup compared to Tacotron 2. However, current text-to-speech models…
]]>Sign up for the latest Speech AI News from NVIDIA. This post, intended for developers with professional level understanding of deep learning, will help you produce a production-ready, AI, text-to-speech model. Converting text into high quality, natural-sounding speech in real time has been a challenging conversational AI task for decades. State-of-the-art speech synthesis models are based on…
]]>