Shashank Verma – NVIDIA Technical Blog

Shashank Verma – NVIDIA Technical Blog News and tutorials for developers, data scientists, and IT admins 2025-05-29T19:05:09Z http://www.open-lab.net/blog/feed/ Shashank Verma <![CDATA[Run Hugging Face Models Instantly with Day-0 Support from NVIDIA NeMo Framework]]> http://www.open-lab.net/blog/?p=99933 2025-05-29T19:05:09Z 2025-05-12T17:48:24Z

As organizations strive to maximize the value of their generative AI investments, accessing the latest model developments is crucial to continued success. By...]]>

As organizations strive to maximize the value of their generative AI investments, accessing the latest model developments is crucial to continued success. By using state-of-the-art models on Day-0, teams can harness these innovations efficiently, maintain relevance, and be competitive. The past year has seen a flurry of exciting model series releases in the open-source community…

]]> Shashank Verma <![CDATA[Enhance Your AI Agent with Data Flywheels Using NVIDIA NeMo Microservices]]> http://www.open-lab.net/blog/?p=98721 2025-05-15T19:08:45Z 2025-04-23T13:00:00Z

Enterprise data is constantly changing. This presents significant challenges for maintaining AI system accuracy over time. As organizations increasingly rely on...]]>

Enterprise data is constantly changing. This presents significant challenges for maintaining AI system accuracy over time. As organizations increasingly rely on agentic AI systems to optimize business processes, keeping these systems aligned with evolving business needs and new data becomes crucial. This post dives into how to build an iteration of a data flywheel using NVIDIA NeMo…

]]> Shashank Verma <![CDATA[LLM Model Pruning and Knowledge Distillation with NVIDIA NeMo Framework]]> http://www.open-lab.net/blog/?p=93451 2025-04-23T02:53:00Z 2025-02-12T17:54:52Z

Model pruning and knowledge distillation are powerful cost-effective strategies for obtaining smaller language models from an initial larger sibling. ...]]>

Model pruning and knowledge distillation are powerful cost-effective strategies for obtaining smaller language models from an initial larger sibling. The How to Prune and Distill Llama-3.1 8B to an NVIDIA Llama-3.1-Minitron 4B Model post discussed the best practices of using large language models (LLMs) that combine depth, width, attention, and MLP pruning with knowledge distillation…

]]> Shashank Verma <![CDATA[Leverage the Latest Open Models for Synthetic Data Generation with NVIDIA Nemotron-4-340B]]> http://www.open-lab.net/blog/?p=84322 2024-10-04T21:38:35Z 2024-08-16T16:15:56Z

[stextbox id="info"]The Llama-3.1-Nemotron 70B-Reward model helps generate high-quality training data that aligns with human preferences for finance, retail,...]]>

The Llama-3.1-Nemotron 70B-Reward model helps generate high-quality training data that aligns with human preferences for finance, retail, healthcare, scientific research, telecommunications, and sovereign AI. This post was updated on August 16, 2024 to reflect the most recent Reward Bench results. Since the introduction and subsequent wide adoption of large language models (LLMs)…

]]> 1 Shashank Verma <![CDATA[Customizing NVIDIA NIM for Domain-Specific Needs with NVIDIA NeMo]]> http://www.open-lab.net/blog/?p=84587 2025-02-17T05:27:27Z 2024-07-10T18:16:21Z

Large language models (LLMs) adopted for specific enterprise applications most often benefit from model customization. Enterprises need to tailor ?LLMs for...]]>

Large language models (LLMs) adopted for specific enterprise applications most often benefit from model customization. Enterprises need to tailor ‌LLMs for their specific needs and quickly deploy them for low-latency and high-throughput inferencing. This post will help you get started with this process. Specifically, we’ll show how to customize the Llama 3 8B NIM for answering questions in…

]]> Shashank Verma <![CDATA[Seamlessly Deploying a Swarm of LoRA Adapters with NVIDIA NIM]]> http://www.open-lab.net/blog/?p=83606 2024-06-13T19:06:00Z 2024-06-07T16:00:00Z

The latest state-of-the-art foundation large language models (LLMs) have billions of parameters and are pretrained on trillions of tokens of input text. They...]]>

The latest state-of-the-art foundation large language models (LLMs) have billions of parameters and are pretrained on trillions of tokens of input text. They often achieve striking results on a wide variety of use cases without any need for customization. Despite this, studies have shown that the best accuracy on downstream tasks can be achieved by adapting LLMs with high-quality…

]]> Shashank Verma <![CDATA[Unlock Your LLM Coding Potential with StarCoder2]]> http://www.open-lab.net/blog/?p=78552 2024-03-07T19:32:10Z 2024-02-28T14:00:00Z

Coding is essential in the digital age, but it can also be tedious and time-consuming. That's why many developers are looking for ways to automate and...]]>

Coding is essential in the digital age, but it can also be tedious and time-consuming. That’s why many developers are looking for ways to automate and streamline their coding tasks with the help of large language models (LLMs). These models are trained on massive amounts of code from permissively licensed GitHub repositories and can generate, analyze, and document code with little human…

]]> 0 Shashank Verma <![CDATA[Evaluating Retriever for Enterprise-Grade RAG]]> http://www.open-lab.net/blog/?p=78222 2024-10-28T21:59:05Z 2024-02-23T19:02:26Z

The conversation about designing and evaluating Retrieval-Augmented Generation (RAG) systems is a long, multi-faceted discussion. Even when we look at retrieval...]]>

The conversation about designing and evaluating Retrieval-Augmented Generation (RAG) systems is a long, multi-faceted discussion. Even when we look at retrieval on its own, developers selectively employ many techniques, such as query decomposition, re-writing, building soft filters, and more, to increase the accuracy of their RAG pipelines. While the techniques vary from system to system…

]]> 0 Shashank Verma <![CDATA[Generate Code, Answer Queries, and Translate Text with New NVIDIA AI Foundation Models]]> http://www.open-lab.net/blog/?p=77364 2024-05-07T19:14:10Z 2024-02-05T18:48:17Z

This week��s Model Monday release features the NVIDIA-optimized code Llama, Kosmos-2, and SeamlessM4T, which you can experience directly from your browser....]]>

This week’s Model Monday release features the NVIDIA-optimized code Llama, Kosmos-2, and SeamlessM4T, which you can experience directly from your browser. With NVIDIA AI Foundation Models and Endpoints, you can access a curated set of community and NVIDIA-built generative AI models to experience, customize, and deploy in enterprise applications. Meta’s Code Llama 70B is the latest…

]]> 0 Shashank Verma <![CDATA[Query Graphs with Optimized DePlot Model]]> http://www.open-lab.net/blog/?p=77003 2024-05-07T16:48:52Z 2024-01-23T00:34:34Z

NVIDIA AI Foundation Models and Endpoints provides access to a curated set of community and NVIDIA-built generative AI models to experience, customize, and...]]>

NVIDIA AI Foundation Models and Endpoints provides access to a curated set of community and NVIDIA-built generative AI models to experience, customize, and deploy in enterprise applications. On Mondays throughout the year, we’ll be releasing new models. This week, we released the NVIDIA-optimized DePlot model, which you can experience directly from your browser. If you haven’t already…

]]> 0 Shashank Verma <![CDATA[Build Enterprise Retrieval-Augmented Generation Apps with NVIDIA Retrieval QA Embedding Model]]> http://www.open-lab.net/blog/?p=74346 2024-10-28T22:00:06Z 2023-11-28T18:10:50Z

Large language models (LLMs) are transforming the AI landscape with their profound grasp of human and programming languages. Essential for next-generation...]]>

Large language models (LLMs) are transforming the AI landscape with their profound grasp of human and programming languages. Essential for next-generation enterprise productivity applications, they enhance user efficiency across tasks like programming, copy editing, brainstorming, and answering questions on a wide range of topics. However, these models often struggle with real-time events and…

]]> 0 Shashank Verma <![CDATA[Mastering LLM Techniques: Inference Optimization]]> http://www.open-lab.net/blog/?p=73739 2024-01-25T18:57:32Z 2023-11-17T15:00:00Z

Stacking transformer layers to create large models results in better accuracies, few-shot learning capabilities, and even near-human emergent abilities on a...]]>

Stacking transformer layers to create large models results in better accuracies, few-shot learning capabilities, and even near-human emergent abilities on a wide range of language tasks. These foundation models are expensive to train, and they can be memory- and compute-intensive during inference (a recurring cost). The most popular large language models (LLMs) today can reach tens to hundreds of…

]]> 0 Shashank Verma <![CDATA[NVIDIA AI Foundation Models: Build Custom Enterprise Chatbots and Co-Pilots with Production-Ready LLMs]]> http://www.open-lab.net/blog/?p=73296 2024-11-20T23:03:22Z 2023-11-15T16:00:00Z

Large language models (LLMs) are revolutionizing data science, enabling advanced capabilities in natural language understanding, AI, and machine learning....]]>

Large language models (LLMs) are revolutionizing data science, enabling advanced capabilities in natural language understanding, AI, and machine learning. Custom LLMs, tailored for domain-specific insights, are finding increased traction in enterprise applications. The NVIDIA Nemotron-3 8B family of foundation models is a powerful new tool for building production-ready generative AI…

]]> 4 Shashank Verma <![CDATA[Scaling Recommendation System Inference with NVIDIA Merlin Hierarchical Parameter Server]]> http://www.open-lab.net/blog/?p=54195 2023-02-28T01:34:06Z 2022-08-31T18:00:00Z

Recommendation systems are widely used today to personalize user experiences and improve customer engagement in various settings like e-commerce, social media,...]]>

Recommendation systems are widely used today to personalize user experiences and improve customer engagement in various settings like e-commerce, social media, and news feeds. Serving user requests with low latency and high accuracy is critical to sustaining user engagement. This includes performing high-speed lookups and computations while seamlessly refreshing models with the newest…

]]> 1 Shashank Verma <![CDATA[Fast, Terabyte-Scale Recommender Training Made Easy with NVIDIA Merlin Distributed-Embeddings]]> http://www.open-lab.net/blog/?p=54372 2022-09-01T23:00:57Z 2022-08-31T16:00:00Z

Embeddings play a key role in deep learning recommender models. They are used to map encoded categorical inputs in data to numerical values that can be...]]>

Embeddings play a key role in deep learning recommender models. They are used to map encoded categorical inputs in data to numerical values that can be processed by the math layers or multilayer perceptrons (MLPs). Embeddings often constitute most of the parameters in deep learning recommender models and can be quite large, even reaching into the terabyte scale. It can be difficult to fit…

]]> 0 Shashank Verma <![CDATA[Building and Deploying Conversational AI Models Using NVIDIA TAO Toolkit]]> http://www.open-lab.net/blog/?p=24079 2023-03-22T01:16:50Z 2021-11-09T16:15:24Z

Sign up for the latest Speech AI news from NVIDIA. Conversational AI is a set of technologies enabling human-like interactions between humans and devices based...]]>

Sign up for the latest Speech AI news from NVIDIA. Conversational AI is a set of technologies enabling human-like interactions between humans and devices based on the most natural interfaces for us: speech and natural language. Systems based on conversational AI can understand commands by recognizing speech and text, translating on-the-fly between different languages…

]]> 2 Shashank Verma <![CDATA[Continuously Improving Recommender Systems for Competitive Advantage Using NVIDIA Merlin and MLOps]]> http://www.open-lab.net/blog/?p=33639 2024-10-28T19:22:30Z 2021-07-01T00:23:02Z

Recommender systems are a critical resource for enterprises that are relentlessly striving to improve customer engagement. They work by suggesting potentially...]]>

Recommender systems are a critical resource for enterprises that are relentlessly striving to improve customer engagement. They work by suggesting potentially relevant products and services amongst an overwhelmingly large and ever-increasing number of offerings. NVIDIA Merlin is an application framework that accelerates all phases of recommender system development on NVIDIA GPUs…

]]> 2 ��˳��97caoporen��