Maggie Zhang – NVIDIA Technical Blog

Maggie Zhang – NVIDIA Technical Blog News and tutorials for developers, data scientists, and IT admins 2025-03-18T18:20:18Z http://www.open-lab.net/blog/feed/ Maggie Zhang <![CDATA[Building a Simple VLM-Based Multimodal Information Retrieval System with NVIDIA NIM]]> http://www.open-lab.net/blog/?p=96151 2025-03-06T19:26:45Z 2025-02-26T17:00:00Z

In today��s data-driven world, the ability to retrieve accurate information from even modest amounts of data is vital for developers seeking streamlined,...]]>

In today’s data-driven world, the ability to retrieve accurate information from even modest amounts of data is vital for developers seeking streamlined, effective solutions for quick deployments, prototyping, or experimentation. One of the key challenges in information retrieval is managing the diverse modalities in unstructured datasets, including text, PDFs, images, tables, audio, video…

]]> 1 Maggie Zhang <![CDATA[Scaling LLMs with NVIDIA Triton and NVIDIA TensorRT-LLM Using Kubernetes]]> http://www.open-lab.net/blog/?p=90412 2025-03-18T18:18:17Z 2024-10-22T16:53:55Z

Large language models (LLMs) have been widely used for chatbots, content generation, summarization, classification, translation, and more. State-of-the-art LLMs...]]>

As of March 18, 2025, NVIDIA Triton Inference Server is now part of the NVIDIA Dynamo Platform and has been renamed to NVIDIA Dynamo Triton, accordingly. Large language models (LLMs) have been widely used for chatbots, content generation, summarization, classification, translation, and more. State-of-the-art LLMs and foundation models, such as Llama, Gemma, GPT, and Nemotron…

]]> Maggie Zhang <![CDATA[Autoscaling NVIDIA Riva Deployment with Kubernetes for Speech AI in Production]]> http://www.open-lab.net/blog/?p=59514 2023-10-20T18:16:30Z 2023-01-12T17:30:00Z

Speech AI applications, from call centers to virtual assistants, rely heavily on automatic speech recognition (ASR) and text-to-speech (TTS). ASR can process...]]>

Speech AI applications, from call centers to virtual assistants, rely heavily on automatic speech recognition (ASR) and text-to-speech (TTS). ASR can process the audio signal and transcribe the audio to text. Speech synthesis or TTS can generate high-quality, natural-sounding audio from the text in real time. The challenge of Speech AI is to achieve high accuracy and meet the latency requirements…

]]> 0 Maggie Zhang <![CDATA[Dividing NVIDIA A30 GPUs and Conquering Multiple Workloads]]> http://www.open-lab.net/blog/?p=50380 2023-04-04T16:58:51Z 2022-08-30T19:00:35Z

Multi-Instance GPU (MIG) is an important feature of NVIDIA H100, A100, and A30 Tensor Core GPUs, as it can partition a GPU into multiple instances. Each...]]>

Multi-Instance GPU (MIG) is an important feature of NVIDIA H100, A100, and A30 Tensor Core GPUs, as it can partition a GPU into multiple instances. Each instance has its own compute cores, high-bandwidth memory, L2 cache, DRAM bandwidth, and media engines such as decoders. This enables multiple workloads or multiple users to run workloads simultaneously on one GPU to maximize the GPU…

]]> 0 Maggie Zhang <![CDATA[Accelerating AI Inference Workloads with NVIDIA A30 GPU]]> http://www.open-lab.net/blog/?p=47944 2022-08-30T18:58:43Z 2022-05-11T22:43:14Z

NVIDIA A30 GPU is built on the latest NVIDIA Ampere Architecture to accelerate diverse workloads like AI inference at scale, enterprise training, and HPC...]]>

NVIDIA A30 GPU is built on the latest NVIDIA Ampere Architecture to accelerate diverse workloads like AI inference at scale, enterprise training, and HPC applications for mainstream servers in data centers. The A30 PCIe card combines the third-generation Tensor Cores with large HBM2 memory (24 GB) and fast GPU memory bandwidth (933 GB/s) in a low-power envelope (maximum 165 W).

]]> 1 Maggie Zhang <![CDATA[Deploying NVIDIA Triton at Scale with MIG and Kubernetes]]> http://www.open-lab.net/blog/?p=31573 2025-03-18T18:20:18Z 2021-08-26T03:00:00Z

NVIDIA Triton Inference Server is an open-source AI model serving software that simplifies the deployment of trained AI models at scale in production. Clients...]]>

Join the NVIDIA Triton and NVIDIA TensorRT community to stay current on the latest product updates, bug fixes, content, best practices, and more. As of March 18, 2025, NVIDIA Triton Inference Server is now part of the NVIDIA Dynamo Platform and has been renamed to NVIDIA Dynamo Triton, accordingly. NVIDIA Triton Inference Server is an open-source AI model serving software that…

]]> 0 Maggie Zhang <![CDATA[Getting the Most Out of the NVIDIA A100 GPU with Multi-Instance GPU]]> http://www.open-lab.net/blog/?p=21816 2023-07-27T19:58:45Z 2020-12-01T00:30:40Z

With the third-generation Tensor Core technology, NVIDIA recently unveiled A100 Tensor Core GPU that delivers unprecedented acceleration at every scale for AI,...]]>

With the third-generation Tensor Core technology, NVIDIA recently unveiled A100 Tensor Core GPU that delivers unprecedented acceleration at every scale for AI, data analytics, and high-performance computing. Along with the great performance increase over prior generation GPUs comes another groundbreaking innovation, Multi-Instance GPU (MIG). With MIG, each A100 GPU can be partitioned up to seven…

]]> 11 Maggie Zhang <![CDATA[Getting Kubernetes Ready for the NVIDIA A100 GPU with Multi-Instance GPU]]> http://www.open-lab.net/blog/?p=22271 2023-07-27T19:59:33Z 2020-12-01T00:30:00Z

Multi-Instance GPU (MIG) is a new feature of the latest generation of NVIDIA GPUs, such as A100. It enables users to maximize the utilization of a single GPU by...]]>

]]> 4 Maggie Zhang <![CDATA[Training Your Own Voice Font Using Flowtron]]> http://www.open-lab.net/blog/?p=20673 2023-07-27T20:00:22Z 2020-10-03T23:40:08Z

Recent conversational AI research has demonstrated automatically generating high quality, human-like audio from text. For example, you can use Tacotron 2 and...]]>

Recent conversational AI research has demonstrated automatically generating high quality, human-like audio from text. For example, you can use Tacotron 2 and WaveGlow to convert text into high quality, natural-sounding speech in real time. You can also use FastPitch to generate mel spectrograms in parallel, achieving good speedup compared to Tacotron 2. However, current text-to-speech models…

]]> 0 Maggie Zhang <![CDATA[Generate Natural Sounding Speech from Text in Real-Time]]> http://www.open-lab.net/blog/?p=15579 2023-02-13T19:11:59Z 2019-09-10T22:56:08Z

Sign up for the latest Speech AI News from NVIDIA. This post, intended for developers with professional level understanding of deep learning, will help you...]]>

Sign up for the latest Speech AI News from NVIDIA. This post, intended for developers with professional level understanding of deep learning, will help you produce a production-ready, AI, text-to-speech model. Converting text into high quality, natural-sounding speech in real time has been a challenging conversational AI task for decades. State-of-the-art speech synthesis models are based on…

]]> 4 ��˳��97caoporen��