Davide Onofrio – NVIDIA Technical Blog

Davide Onofrio – NVIDIA Technical Blog News and tutorials for developers, data scientists, and IT admins 2025-03-18T18:20:18Z http://www.open-lab.net/blog/feed/ Davide Onofrio <![CDATA[Autoscaling NVIDIA Riva Deployment with Kubernetes for Speech AI in Production]]> http://www.open-lab.net/blog/?p=59514 2023-10-20T18:16:30Z 2023-01-12T17:30:00Z

Speech AI applications, from call centers to virtual assistants, rely heavily on automatic speech recognition (ASR) and text-to-speech (TTS). ASR can process...]]>

Speech AI applications, from call centers to virtual assistants, rely heavily on automatic speech recognition (ASR) and text-to-speech (TTS). ASR can process the audio signal and transcribe the audio to text. Speech synthesis or TTS can generate high-quality, natural-sounding audio from the text in real time. The challenge of Speech AI is to achieve high accuracy and meet the latency requirements…

]]> 0 Davide Onofrio <![CDATA[Introducing NVIDIA Riva: A GPU-Accelerated SDK for Developing Speech AI Applications]]> http://www.open-lab.net/blog/?p=17451 2023-05-22T22:12:28Z 2022-12-08T23:37:19Z

This post was updated in March 2023. Sign up for the latest Speech AI news from NVIDIA. Speech AI is used in a variety of applications, including contact...]]>

This post was updated in March 2023. Sign up for the latest Speech AI news from NVIDIA. Speech AI is used in a variety of applications, including contact centers’ agent assists for empowering human agents, voice interfaces for intelligent virtual assistants (IVAs), and live captioning in video conferencing. To support these features, speech AI technology includes automatic speech recognition…

]]> 3 Davide Onofrio <![CDATA[Developing the Next Generation of Extended Reality Applications with Speech AI]]> http://www.open-lab.net/blog/?p=54831 2023-11-03T07:15:10Z 2022-09-14T16:00:00Z

Virtual reality (VR), augmented reality (AR), and mixed reality (MR) environments can feel incredibly real due to the physically immersive experience. Adding a...]]>

]]> 0 Davide Onofrio <![CDATA[Dividing NVIDIA A30 GPUs and Conquering Multiple Workloads]]> http://www.open-lab.net/blog/?p=50380 2023-04-04T16:58:51Z 2022-08-30T19:00:35Z

Multi-Instance GPU (MIG) is an important feature of NVIDIA H100, A100, and A30 Tensor Core GPUs, as it can partition a GPU into multiple instances. Each...]]>

Multi-Instance GPU (MIG) is an important feature of NVIDIA H100, A100, and A30 Tensor Core GPUs, as it can partition a GPU into multiple instances. Each instance has its own compute cores, high-bandwidth memory, L2 cache, DRAM bandwidth, and media engines such as decoders. This enables multiple workloads or multiple users to run workloads simultaneously on one GPU to maximize the GPU…

]]> 0 Davide Onofrio <![CDATA[Accelerating AI Inference Workloads with NVIDIA A30 GPU]]> http://www.open-lab.net/blog/?p=47944 2022-08-30T18:58:43Z 2022-05-11T22:43:14Z

NVIDIA A30 GPU is built on the latest NVIDIA Ampere Architecture to accelerate diverse workloads like AI inference at scale, enterprise training, and HPC...]]>

NVIDIA A30 GPU is built on the latest NVIDIA Ampere Architecture to accelerate diverse workloads like AI inference at scale, enterprise training, and HPC applications for mainstream servers in data centers. The A30 PCIe card combines the third-generation Tensor Cores with large HBM2 memory (24 GB) and fast GPU memory bandwidth (933 GB/s) in a low-power envelope (maximum 165 W).

]]> 1 Davide Onofrio <![CDATA[Deploying NVIDIA Triton at Scale with MIG and Kubernetes]]> http://www.open-lab.net/blog/?p=31573 2025-03-18T18:20:18Z 2021-08-26T03:00:00Z

NVIDIA Triton Inference Server is an open-source AI model serving software that simplifies the deployment of trained AI models at scale in production. Clients...]]>

Join the NVIDIA Triton and NVIDIA TensorRT community to stay current on the latest product updates, bug fixes, content, best practices, and more. As of March 18, 2025, NVIDIA Triton Inference Server is now part of the NVIDIA Dynamo Platform and has been renamed to NVIDIA Dynamo Triton, accordingly. NVIDIA Triton Inference Server is an open-source AI model serving software that…

]]> 0 Davide Onofrio <![CDATA[Real-Time Natural Language Processing with BERT Using NVIDIA TensorRT (Updated)]]> http://www.open-lab.net/blog/?p=34688 2023-06-12T21:08:51Z 2021-07-20T13:00:00Z

This post was originally published in August 2019 and has been updated for NVIDIA TensorRT 8.0. Large-scale language models (LSLMs) such as BERT, GPT-2, and...]]>

This post was originally published in August 2019 and has been updated for NVIDIA TensorRT 8.0. Join the NVIDIA Triton and NVIDIA TensorRT community to stay current on the latest product updates, bug fixes, content, best practices, and more. Large-scale language models (LSLMs) such as BERT, GPT-2, and XL-Net have brought exciting leaps in accuracy for many natural language processing…

]]> 0 Davide Onofrio <![CDATA[Continuously Improving Recommender Systems for Competitive Advantage Using NVIDIA Merlin and MLOps]]> http://www.open-lab.net/blog/?p=33639 2024-10-28T19:22:30Z 2021-07-01T00:23:02Z

Recommender systems are a critical resource for enterprises that are relentlessly striving to improve customer engagement. They work by suggesting potentially...]]>

Recommender systems are a critical resource for enterprises that are relentlessly striving to improve customer engagement. They work by suggesting potentially relevant products and services amongst an overwhelmingly large and ever-increasing number of offerings. NVIDIA Merlin is an application framework that accelerates all phases of recommender system development on NVIDIA GPUs…

]]> 2 Davide Onofrio <![CDATA[MLPerf v1.0 Training Benchmarks: Insights into a Record-Setting NVIDIA Performance]]> http://www.open-lab.net/blog/?p=33929 2023-07-05T19:31:00Z 2021-06-30T17:00:00Z

MLPerf is an industry-wide AI consortium tasked with developing a suite of performance benchmarks that cover a range of leading AI workloads widely in use. The...]]>

MLPerf is an industry-wide AI consortium tasked with developing a suite of performance benchmarks that cover a range of leading AI workloads widely in use. The latest MLPerf v1.0 training round includes vision, language and recommender systems, and reinforcement learning tasks. It is continually evolving to reflect the state-of-the-art AI applications. NVIDIA submitted MLPerf v1.0…

]]> 1 Davide Onofrio <![CDATA[Minimizing Deep Learning Inference Latency with NVIDIA Multi-Instance GPU]]> http://www.open-lab.net/blog/?p=22868 2022-08-21T23:40:50Z 2020-12-18T18:39:52Z

Recently, NVIDIA unveiled the A100 GPU model, based on the NVIDIA Ampere architecture. Ampere introduced many features, including Multi-Instance GPU (MIG), that...]]>

Recently, NVIDIA unveiled the A100 GPU model, based on the NVIDIA Ampere architecture. Ampere introduced many features, including Multi-Instance GPU (MIG), that play a special role for deep learning-based (DL) applications. MIG makes it possible to use a single A100 GPU as if it were multiple smaller GPUs, maximizing utilization for DL workloads and providing dynamic scalability.

]]> 1 Davide Onofrio <![CDATA[Simplifying and Scaling Inference Serving with NVIDIA Triton 2.3]]> http://www.open-lab.net/blog/?p=21209 2023-03-22T01:09:07Z 2020-10-05T13:00:00Z

AI, machine learning (ML), and deep learning (DL) are effective tools for solving diverse computing problems such as product recommendations, customer...]]>

AI, machine learning (ML), and deep learning (DL) are effective tools for solving diverse computing problems such as product recommendations, customer interactions, financial risk assessment, manufacturing defect detection, and more. Using an AI model in production, called inference serving, is the most complex part of incorporating AI in applications. Triton Inference Server takes care of all the…

]]> 0 Davide Onofrio <![CDATA[Integrating NVIDIA Triton Inference Server with Kaldi ASR]]> http://www.open-lab.net/blog/?p=19647 2022-08-21T23:40:33Z 2020-08-14T18:26:32Z

Speech processing is compute-intensive and requires a powerful and flexible platform to power modern conversational AI applications. It seemed natural to...]]>

Speech processing is compute-intensive and requires a powerful and flexible platform to power modern conversational AI applications. It seemed natural to combine the de facto standard platform for automatic speech recognition (ASR), the Kaldi Speech Recognition Toolkit, with the power and flexibility of NVIDIA GPUs. Kaldi adopted GPU acceleration for training workloads early on.

]]> 3 Davide Onofrio <![CDATA[Real-Time Natural Language Understanding with BERT Using TensorRT]]> http://www.open-lab.net/blog/?p=15432 2022-10-10T18:51:43Z 2019-08-13T13:00:19Z

Large scale language models (LSLMs) such as BERT, GPT-2, and XL-Net have brought about exciting leaps in state-of-the-art accuracy for many natural language...]]>

Large scale language models (LSLMs) such as BERT, GPT-2, and XL-Net have brought about exciting leaps in state-of-the-art accuracy for many natural language understanding (NLU) tasks. Since its release in Oct 2018, BERT1 (Bidirectional Encoder Representations from Transformers) remains one of the most popular language models and still delivers state of the art accuracy at the time of writing2.

]]> 11 ��˳��97caoporen��