Shubham Agrawal – NVIDIA Technical Blog

Shubham Agrawal – NVIDIA Technical Blog News and tutorials for developers, data scientists, and IT admins 2025-05-19T02:05:15Z http://www.open-lab.net/blog/feed/ Shubham Agrawal <![CDATA[Advance Video Analytics AI Agents Using the NVIDIA AI Blueprint for Video Search and Summarization]]> http://www.open-lab.net/blog/?p=98690 2025-05-19T02:05:15Z 2025-05-19T06:00:00Z

Vision language models (VLMs) have transformed video analytics by enabling broader perception and richer contextual understanding compared to traditional...]]>

Vision language models (VLMs) have transformed video analytics by enabling broader perception and richer contextual understanding compared to traditional computer vision (CV) models. However, challenges like limited context length and lack of audio transcription still exist, restricting how much video a VLM can process at a time. To overcome this, the NVIDIA AI Blueprint for video search and…

]]> Shubham Agrawal <![CDATA[Build Real-Time Multimodal XR Apps with NVIDIA AI Blueprint for Video Search and Summarization]]> http://www.open-lab.net/blog/?p=96842 2025-03-12T22:08:59Z 2025-03-11T17:30:00Z

With the recent advancements in generative AI and vision foundational models, VLMs present a new wave of visual computing wherein the models are capable of...]]>

With the recent advancements in generative AI and vision foundational models, VLMs present a new wave of visual computing wherein the models are capable of highly sophisticated perception and deep contextual understanding. These intelligent solutions offer a promising means of enhancing semantic comprehension in XR settings. By integrating VLMs, developers can significantly improve how XR…

]]> Shubham Agrawal <![CDATA[Vision Language Model Prompt Engineering Guide for Image and Video Understanding]]> http://www.open-lab.net/blog/?p=96229 2025-04-23T02:38:32Z 2025-02-26T16:25:34Z

Vision language models (VLMs) are evolving at a breakneck speed. In 2020, the first VLMs revolutionized the generative AI landscape by bringing visual...]]>

Vision language models (VLMs) are evolving at a breakneck speed. In 2020, the first VLMs revolutionized the generative AI landscape by bringing visual understanding to large language models (LLMs) through the use of a vision encoder. These initial VLMs were limited in their abilities, only able to understand text and single image inputs. Fast-forward a few years and VLMs are now capable of…

]]> Shubham Agrawal <![CDATA[Build an Agentic Video Workflow with Video Search and Summarization]]> http://www.open-lab.net/blog/?p=92834 2025-01-07T05:45:50Z 2024-12-03T18:30:00Z

Building a question-answering chatbot with large language models (LLMs) is now a common workflow for text-based interactions. What about creating an AI system...]]>

Building a question-answering chatbot with large language models (LLMs) is now a common workflow for text-based interactions. What about creating an AI system that can answer questions about video and image content? This presents a far more complex task. Traditional video analytics tools struggle due to their limited functionality and a narrow focus on predefined objects.

]]> Shubham Agrawal <![CDATA[New Foundational Models and Training Capabilities with NVIDIA TAO 5.5]]> http://www.open-lab.net/blog/?p=87263 2024-09-09T19:37:08Z 2024-08-28T16:00:00Z

NVIDIA TAO is a framework designed to simplify and accelerate the development and deployment of AI models. It enables you to use pretrained models, fine-tune...]]>

NVIDIA TAO is a framework designed to simplify and accelerate the development and deployment of AI models. It enables you to use pretrained models, fine-tune them with your own data, and optimize the models for specific use cases without needing deep AI expertise. TAO integrates seamlessly with the NVIDIA hardware and software ecosystem, providing tools for efficient AI model training…

]]> ��˳��97caoporen��