As organizations strive to maximize the value of their generative AI investments, accessing the latest model developments is crucial to continued success. By using state-of-the-art models on Day-0, teams can harness these innovations efficiently, maintain relevance, and be competitive. The past year has seen a flurry of exciting model series releases in the open-source community…
]]>Enterprise data is constantly changing. This presents significant challenges for maintaining AI system accuracy over time. As organizations increasingly rely on agentic AI systems to optimize business processes, keeping these systems aligned with evolving business needs and new data becomes crucial. This post dives into how to build an iteration of a data flywheel using NVIDIA NeMo…
]]>Model pruning and knowledge distillation are powerful cost-effective strategies for obtaining smaller language models from an initial larger sibling. The How to Prune and Distill Llama-3.1 8B to an NVIDIA Llama-3.1-Minitron 4B Model post discussed the best practices of using large language models (LLMs) that combine depth, width, attention, and MLP pruning with knowledge distillation…
]]>The Llama-3.1-Nemotron 70B-Reward model helps generate high-quality training data that aligns with human preferences for finance, retail, healthcare, scientific research, telecommunications, and sovereign AI. This post was updated on August 16, 2024 to reflect the most recent Reward Bench results. Since the introduction and subsequent wide adoption of large language models (LLMs)…
]]>Large language models (LLMs) adopted for specific enterprise applications most often benefit from model customization. Enterprises need to tailor LLMs for their specific needs and quickly deploy them for low-latency and high-throughput inferencing. This post will help you get started with this process. Specifically, we’ll show how to customize the Llama 3 8B NIM for answering questions in…
]]>The latest state-of-the-art foundation large language models (LLMs) have billions of parameters and are pretrained on trillions of tokens of input text. They often achieve striking results on a wide variety of use cases without any need for customization. Despite this, studies have shown that the best accuracy on downstream tasks can be achieved by adapting LLMs with high-quality…
]]>Coding is essential in the digital age, but it can also be tedious and time-consuming. That’s why many developers are looking for ways to automate and streamline their coding tasks with the help of large language models (LLMs). These models are trained on massive amounts of code from permissively licensed GitHub repositories and can generate, analyze, and document code with little human…
]]>The conversation about designing and evaluating Retrieval-Augmented Generation (RAG) systems is a long, multi-faceted discussion. Even when we look at retrieval on its own, developers selectively employ many techniques, such as query decomposition, re-writing, building soft filters, and more, to increase the accuracy of their RAG pipelines. While the techniques vary from system to system…
]]>This week’s Model Monday release features the NVIDIA-optimized code Llama, Kosmos-2, and SeamlessM4T, which you can experience directly from your browser. With NVIDIA AI Foundation Models and Endpoints, you can access a curated set of community and NVIDIA-built generative AI models to experience, customize, and deploy in enterprise applications. Meta’s Code Llama 70B is the latest…
]]>NVIDIA AI Foundation Models and Endpoints provides access to a curated set of community and NVIDIA-built generative AI models to experience, customize, and deploy in enterprise applications. On Mondays throughout the year, we’ll be releasing new models. This week, we released the NVIDIA-optimized DePlot model, which you can experience directly from your browser. If you haven’t already…
]]>Large language models (LLMs) are transforming the AI landscape with their profound grasp of human and programming languages. Essential for next-generation enterprise productivity applications, they enhance user efficiency across tasks like programming, copy editing, brainstorming, and answering questions on a wide range of topics. However, these models often struggle with real-time events and…
]]>Stacking transformer layers to create large models results in better accuracies, few-shot learning capabilities, and even near-human emergent abilities on a wide range of language tasks. These foundation models are expensive to train, and they can be memory- and compute-intensive during inference (a recurring cost). The most popular large language models (LLMs) today can reach tens to hundreds of…
]]>Large language models (LLMs) are revolutionizing data science, enabling advanced capabilities in natural language understanding, AI, and machine learning. Custom LLMs, tailored for domain-specific insights, are finding increased traction in enterprise applications. The NVIDIA Nemotron-3 8B family of foundation models is a powerful new tool for building production-ready generative AI…
]]>Recommendation systems are widely used today to personalize user experiences and improve customer engagement in various settings like e-commerce, social media, and news feeds. Serving user requests with low latency and high accuracy is critical to sustaining user engagement. This includes performing high-speed lookups and computations while seamlessly refreshing models with the newest…
]]>Embeddings play a key role in deep learning recommender models. They are used to map encoded categorical inputs in data to numerical values that can be processed by the math layers or multilayer perceptrons (MLPs). Embeddings often constitute most of the parameters in deep learning recommender models and can be quite large, even reaching into the terabyte scale. It can be difficult to fit…
]]>Sign up for the latest Speech AI news from NVIDIA. Conversational AI is a set of technologies enabling human-like interactions between humans and devices based on the most natural interfaces for us: speech and natural language. Systems based on conversational AI can understand commands by recognizing speech and text, translating on-the-fly between different languages…
]]>Recommender systems are a critical resource for enterprises that are relentlessly striving to improve customer engagement. They work by suggesting potentially relevant products and services amongst an overwhelmingly large and ever-increasing number of offerings. NVIDIA Merlin is an application framework that accelerates all phases of recommender system development on NVIDIA GPUs…
]]>