ONNX – NVIDIA Technical Blog

ONNX – NVIDIA Technical Blog News and tutorials for developers, data scientists, and IT admins 2025-05-16T23:50:38Z http://www.open-lab.net/blog/feed/ Ashraf Eassa <![CDATA[Llama 3.2 Full-Stack Optimizations Unlock High Performance on NVIDIA GPUs]]> http://www.open-lab.net/blog/?p=90142 2024-11-22T23:11:53Z 2024-11-19T16:00:00Z

Meta recently released its Llama 3.2 series of vision language models (VLMs), which come in 11B parameter and 90B parameter variants. These models are...]]>

Meta recently released its Llama 3.2 series of vision language models (VLMs), which come in 11B parameter and 90B parameter variants. These models are...

three-llamas-holding-number-10-signs

Meta recently released its Llama 3.2 series of vision language models (VLMs), which come in 11B parameter and 90B parameter variants. These models are multimodal, supporting both text and image inputs. In addition, Meta has launched text-only small language model (SLM) variants of Llama 3.2 with 1B and 3B parameters. NVIDIA has optimized the Llama 3.2 collection of models for great performance and��

]]> 0 William Raveane <![CDATA[Optimizing Microsoft Bing Visual Search with NVIDIA Accelerated Libraries]]> http://www.open-lab.net/blog/?p=89831 2024-11-14T16:23:01Z 2024-10-07T21:11:06Z

Microsoft Bing Visual Search enables people around the world to find content using photographs as queries. The heart of this capability is Microsoft's TuringMM...]]>

Microsoft Bing Visual Search enables people around the world to find content using photographs as queries. The heart of this capability is Microsoft's TuringMM...

microsoft-bing-visual-search.

Microsoft Bing Visual Search enables people around the world to find content using photographs as queries. The heart of this capability is Microsoft��s TuringMM visual embedding model that maps images and text into a shared high-dimensional space. Operating on billions of images across the web, performance is critical. This post details efforts to optimize the TuringMM pipeline using NVIDIA��

]]> 0 Gunjan Mehta <![CDATA[Maximum Performance and Minimum Footprint for AI Apps with NVIDIA TensorRT Weight-Stripped Engines]]> http://www.open-lab.net/blog/?p=83568 2024-11-14T15:55:20Z 2024-06-11T16:33:50Z

NVIDIA TensorRT, an established inference library for data centers, has rapidly emerged as a desirable inference backend for NVIDIA GeForce RTX and NVIDIA RTX...]]>

NVIDIA TensorRT, an established inference library for data centers, has rapidly emerged as a desirable inference backend for NVIDIA GeForce RTX and NVIDIA RTX... Decorative image of TensorRT workflow on a black background.

Decorative image of TensorRT workflow on a black background.

NVIDIA TensorRT, an established inference library for data centers, has rapidly emerged as a desirable inference backend for NVIDIA GeForce RTX and NVIDIA RTX GPUs. Now, deploying TensorRT into apps has gotten even easier with prebuilt TensorRT engines. The newly released TensorRT 10.0 with weight-stripped engines offers a unique solution for minimizing the engine shipment size by reducing��

]]> 0 Vishal Chavan <![CDATA[Robust Scene Text Detection and Recognition: Inference Optimization]]> http://www.open-lab.net/blog/?p=74321 2024-11-14T15:43:46Z 2024-01-16T17:02:00Z

In this post, we delve deeper into the inference optimization process to improve the performance and efficiency of our machine learning models during the...]]>

In this post, we delve deeper into the inference optimization process to improve the performance and efficiency of our machine learning models during the... Decorative image of a workflow and the text

Decorative image of a workflow and the text

In this post, we delve deeper into the inference optimization process to improve the performance and efficiency of our machine learning models during the inference stage. We discuss the techniques employed, such as inference computation graph simplification, quantization, and lowering precision. We also showcase the benchmarking results of our scene text detection and recognition models��

]]> 3 Vishal Chavan <![CDATA[Robust Scene Text Detection and Recognition: Implementation]]> http://www.open-lab.net/blog/?p=74323 2024-11-14T15:44:08Z 2024-01-16T17:01:00Z

To make scene text detection and recognition work on irregular text or for specific use cases, you must have full control of your model so that you can do...]]>

To make scene text detection and recognition work on irregular text or for specific use cases, you must have full control of your model so that you can do... Decorative image of a workflow and the text

Decorative image of a workflow and the text

To make scene text detection and recognition work on irregular text or for specific use cases, you must have full control of your model so that you can do incremental learning or fine-tuning as per your use cases and datasets. Keep in mind that this pipeline is the main building block of scene understanding, AI-based inspection, and document processing platforms. It should be accurate and have low��

]]> 0 Vishal Chavan <![CDATA[Robust Scene Text Detection and Recognition: Introduction]]> http://www.open-lab.net/blog/?p=74322 2024-11-14T15:45:25Z 2024-01-16T17:00:00Z

Identification and recognition of text from natural scenes and images become important for use cases like video caption text recognition, detecting signboards...]]>

Identification and recognition of text from natural scenes and images become important for use cases like video caption text recognition, detecting signboards... Decorative image of a workflow and the text

Decorative image of a workflow and the text

Identification and recognition of text from natural scenes and images become important for use cases like video caption text recognition, detecting signboards from vehicle-mounted cameras, information retrieval, scene understanding, vehicle number plate recognition, and recognizing text on products. Most of these use cases require near real-time performance. The common technique for text��

]]> 0 Ayesha Asif <![CDATA[End-to-End AI for NVIDIA-Based PCs: Optimizing AI by Transitioning from FP32 to FP16]]> http://www.open-lab.net/blog/?p=63734 2024-08-28T17:41:58Z 2023-04-27T16:00:00Z

This post is part of a series about optimizing end-to-end AI. The performance of AI models is heavily influenced by the precision of the computational resources...]]>

This post is part of a series about optimizing end-to-end AI. The performance of AI models is heavily influenced by the precision of the computational resources... Series image with part 7 caption.

Series image with part 7 caption.

This post is part of a series about optimizing end-to-end AI. The performance of AI models is heavily influenced by the precision of the computational resources being used. Lower precision can lead to faster processing speeds and reduced memory usage, while higher precision can contribute to more accurate results. Finding the right balance between precision and performance is crucial for��

]]> 0 Chris Hebert <![CDATA[End-to-End AI for NVIDIA-Based PCs: ONNX and DirectML]]> http://www.open-lab.net/blog/?p=63715 2025-03-13T20:12:58Z 2023-04-25T15:00:00Z

This post is part of a series about optimizing end-to-end AI. While NVIDIA hardware can process the individual operations that constitute a neural network...]]>

This post is part of a series about optimizing end-to-end AI. While NVIDIA hardware can process the individual operations that constitute a neural network...

End-to-End AI for NVIDIA-Based PCs: ONNX and DirectML

This post is part of a series about optimizing end-to-end AI. While NVIDIA hardware can process the individual operations that constitute a neural network incredibly fast, it is important to ensure that you are using the tools correctly. Using the respective tools such as ONNX Runtime or TensorRT out of the box with ONNX usually gives you good performance, but why settle for good performance��

]]> 0 Maximilian M��ller <![CDATA[End-to-End AI for NVIDIA-Based PCs: NVIDIA TensorRT Deployment]]> http://www.open-lab.net/blog/?p=61010 2023-06-09T22:41:06Z 2023-03-15T16:30:00Z

This post is the fifth in a series about optimizing end-to-end AI. NVIDIA TensorRT is a solution for speed-of-light inference deployment on NVIDIA hardware....]]>

This post is the fifth in a series about optimizing end-to-end AI. NVIDIA TensorRT is a solution for speed-of-light inference deployment on NVIDIA hardware.... Featured image of computer screens in stylized design.

Featured image of computer screens in stylized design.

This post is the fifth in a series about optimizing end-to-end AI. NVIDIA TensorRT is a solution for speed-of-light inference deployment on NVIDIA hardware. Provided with an AI model architecture, TensorRT can be used pre-deployment to run an excessive search for the most efficient execution strategy. TensorRT optimizations include reordering operations in a graph��

]]> 0 Diego Farinha <![CDATA[Top AI for Creative Applications Sessions at NVIDIA GTC 2023]]> http://www.open-lab.net/blog/?p=61959 2023-03-14T19:59:47Z 2023-03-14T21:00:00Z

Learn how AI is boosting creative applications for creators during NVIDIA GTC 2023, March 20-23.]]>

Learn how AI is boosting creative applications for creators during NVIDIA GTC 2023, March 20-23. Illustration of computer screens in stylized design.

Illustration of computer screens in stylized design.

Learn how AI is boosting creative applications for creators during NVIDIA GTC 2023, March 20-23.

]]> 0 Maximilian M��ller <![CDATA[End-to-End AI for NVIDIA-Based PCs: CUDA and TensorRT Execution Providers in ONNX Runtime]]> http://www.open-lab.net/blog/?p=60430 2023-06-12T07:55:55Z 2023-02-08T19:16:29Z

This post is the fourth in a series about optimizing end-to-end AI. As explained in the previous post in the End-to-End AI for NVIDIA-Based PCs series, there...]]>

This post is the fourth in a series about optimizing end-to-end AI. As explained in the previous post in the End-to-End AI for NVIDIA-Based PCs series, there... End-to-end AI series Part 4

End-to-end AI series Part 4

This post is the fourth in a series about optimizing end-to-end AI. As explained in the previous post in the End-to-End AI for NVIDIA-Based PCs series, there are multiple execution providers (EPs) in ONNX Runtime that enable the use of hardware-specific features or optimizations for a given deployment scenario. This post covers the CUDA EP and TensorRT EP using the highly optimized NVIDIA��

]]> 6 Luca Spindler <![CDATA[End-to-End AI for NVIDIA-Based PCs:?ONNX Runtime and Optimization]]> http://www.open-lab.net/blog/?p=58640 2023-06-12T08:21:40Z 2022-12-15T23:40:31Z

This post is the third in a series about optimizing end-to-end AI. When your model has been converted to the ONNX format, there are several ways to deploy it,...]]>

This post is the third in a series about optimizing end-to-end AI. When your model has been converted to the ONNX format, there are several ways to deploy it,... End-to-end AI series Part 3

End-to-end AI series Part 3

This post is the third in a series about optimizing end-to-end AI. When your model has been converted to the ONNX format, there are several ways to deploy it, each with advantages and drawbacks. One method is to use ONNX Runtime. ONNX Runtime serves as the backend, reading a model from an intermediate representation (ONNX), handling the inference session, and scheduling execution on an��

]]> 0 Luca Spindler <![CDATA[End-to-End AI for NVIDIA-Based PCs: Transitioning AI Models with ONNX]]> http://www.open-lab.net/blog/?p=59024 2023-06-12T08:18:51Z 2022-12-15T23:39:40Z

This post is the second in a series about optimizing end-to-end AI. In this post, I discuss how to use ONNX to transition your AI models from research to...]]>

This post is the second in a series about optimizing end-to-end AI. In this post, I discuss how to use ONNX to transition your AI models from research to... End-to-end AI series Part 2

End-to-end AI series Part 2

This post is the second in a series about optimizing end-to-end AI. In this post, I discuss how to use ONNX to transition your AI models from research to production while avoiding common mistakes. Considering that PyTorch has become the most popular machine learning framework, all my examples use it but I also supply references to TensorFlow tutorials. ONNX (Open Neural Network��

]]> 0 Chris Hebert <![CDATA[End-to-End AI for NVIDIA-Based PCs: An Introduction to Optimization]]> http://www.open-lab.net/blog/?p=59060 2023-06-12T08:18:26Z 2022-12-15T23:35:00Z

This post is the first in a series about optimizing end-to-end AI. The great thing about the GPU is that it offers tremendous parallelism; it allows you to...]]>

This post is the first in a series about optimizing end-to-end AI. The great thing about the GPU is that it offers tremendous parallelism; it allows you to... End-to-end AI series Part 1

End-to-end AI series Part 1

This post is the first in a series about optimizing end-to-end AI. The great thing about the GPU is that it offers tremendous parallelism; it allows you to perform many tasks at the same time. At its most granular level, this comes down to the fact that there are thousands of tiny processing cores that run the same instruction at the same time. But that is not where such parallelism stops.

]]> 1 Manuel Reyes-Gomez <![CDATA[Boosting AI Model Inference Performance on Azure Machine Learning]]> http://www.open-lab.net/blog/?p=54061 2022-11-14T21:35:42Z 2022-08-29T17:00:00Z

Every AI application needs a strong inference engine. Whether you��re deploying an image recognition service, intelligent virtual assistant, or a fraud...]]>

Every AI application needs a strong inference engine. Whether you��re deploying an image recognition service, intelligent virtual assistant, or a fraud...

Temp-640x360

Join the NVIDIA Triton and NVIDIA TensorRT community to stay current on the latest product updates, bug fixes, content, best practices, and more. Every AI application needs a strong inference engine. Whether you��re deploying an image recognition service, intelligent virtual assistant, or a fraud detection application, a reliable inference server delivers fast, accurate��

]]> 0 Houman Abbasian <![CDATA[Speeding Up Deep Learning Inference Using TensorFlow, ONNX, and NVIDIA TensorRT]]> http://www.open-lab.net/blog/?p=16755 2022-08-21T23:39:52Z 2021-07-20T13:00:00Z

This post was updated July 20, 2021 to reflect NVIDIA TensorRT 8.0 updates. In this post, you learn how to deploy TensorFlow trained deep learning models using...]]>

This post was updated July 20, 2021 to reflect NVIDIA TensorRT 8.0 updates. In this post, you learn how to deploy TensorFlow trained deep learning models using...

tensorrt-inference-accelerator

]]> 19 Dheeraj Peri <![CDATA[Estimating Depth with ONNX Models and Custom Layers Using NVIDIA TensorRT]]> http://www.open-lab.net/blog/?p=20731 2022-10-10T19:00:08Z 2020-09-24T18:20:20Z

TensorRT is an SDK for high performance, deep learning inference. It includes a deep learning inference optimizer and a runtime that delivers low latency and...]]>

TensorRT is an SDK for high performance, deep learning inference. It includes a deep learning inference optimizer and a runtime that delivers low latency and...

tensorrt-support (2)

TensorRT is an SDK for high performance, deep learning inference. It includes a deep learning inference optimizer and a runtime that delivers low latency and high throughput for deep learning applications. TensorRT uses the ONNX format as an intermediate representation for converting models from major frameworks such as TensorFlow and PyTorch. In this post, you learn how to convert PyTorch��

]]> 5 Natalie Kershaw <![CDATA[Announcing ONNX Runtime Availability in the NVIDIA Jetson Zoo for High Performance Inferencing]]> http://www.open-lab.net/blog/?p=19761 2022-08-21T23:40:34Z 2020-08-19T21:35:16Z

? Microsoft and NVIDIA have collaborated to build, validate and publish the ONNX Runtime Python package and Docker container for the NVIDIA Jetson platform,...]]>

? Microsoft and NVIDIA have collaborated to build, validate and publish the ONNX Runtime Python package and Docker container for the NVIDIA Jetson platform,...

Microsoft and NVIDIA have collaborated to build, validate and publish the ONNX Runtime Python package and Docker container for the NVIDIA Jetson platform, now available on the Jetson Zoo. Today��s release of ONNX Runtime for Jetson extends the performance and portability benefits of ONNX Runtime to Jetson edge AI systems, allowing models from many different frameworks��

]]> 4 Dennis Sandler <![CDATA[Using Windows ML, ONNX, and NVIDIA Tensor Cores]]> http://www.open-lab.net/blog/?p=17158 2022-08-21T23:39:59Z 2020-04-28T21:53:53Z

As more and more deep learning models are being deployed into production environments, there is a growing need for a separation between the work on the model...]]>

As more and more deep learning models are being deployed into production environments, there is a growing need for a separation between the work on the model...

winml-architecture

As more and more deep learning models are being deployed into production environments, there is a growing need for a separation between the work on the model itself, and the work of integrating it into a production pipeline. Windows ML caters to this demand by addressing efficient deployment of pretrained deep learning models into Windows applications. Developing and training the model itself��

]]> 0 Chris Hebert <![CDATA[Accelerating WinML and NVIDIA Tensor Cores]]> http://www.open-lab.net/blog/?p=16861 2022-08-21T23:39:54Z 2020-04-03T21:28:05Z

Figure 1. TensorCores. Every year, clever researchers introduce ever more complex and interesting deep learning models to the world. There is of course a big...]]>

Figure 1. TensorCores. Every year, clever researchers introduce ever more complex and interesting deep learning models to the world. There is of course a big...

Tensor_Cores

Every year, clever researchers introduce ever more complex and interesting deep learning models to the world. There is of course a big difference between a model that works as a nice demo in isolation and a model that performs a function within a production pipeline. This is particularly pertinent to creative apps where generative models must run with low latency to generate or enhance image��

]]> 0 Piotr Wojciechowski <![CDATA[How to Speed Up Deep Learning Inference Using TensorRT]]> http://www.open-lab.net/blog/?p=12738 2022-10-10T18:51:43Z 2018-11-08T15:00:52Z

[stextbox id="info"]Looking for more? Check out the hands-on DLI training course: Optimization and Deployment of TensorFlow Models with TensorRT[/stextbox]...]]>

[stextbox id="info"]Looking for more? Check out the hands-on DLI training course: Optimization and Deployment of TensorFlow Models with TensorRT[/stextbox]...

NV_TensorRT_Visual_2C_RGB-625x625

]]> 18 Siddharth Sharma <![CDATA[TensorRT 4 Accelerates Neural Machine Translation, Recommenders, and Speech]]> http://www.open-lab.net/blog/?p=10726 2023-03-14T19:00:22Z 2018-06-19T13:00:45Z

NVIDIA has released TensorRT?4 at CVPR 2018. This new version of TensorRT, NVIDIA��s powerful inference optimizer and runtime engine provides: New Recurrent...]]>

NVIDIA has released TensorRT?4 at CVPR 2018. This new version of TensorRT, NVIDIA��s powerful inference optimizer and runtime engine provides: New Recurrent...

onnx2

NVIDIA has released TensorRT 4 at CVPR 2018. This new version of TensorRT, NVIDIA��s powerful inference optimizer and runtime engine provides: Additional features include the ability to execute custom neural network layers using FP16 precision and support for the Xavier SoC through NVIDIA DRIVE AI platforms. TensorRT 4 speeds up deep learning inference applications such as neural machine��

]]> 0 ��˳��97caoporen��