LLMs – NVIDIA Technical Blog

LLMs – NVIDIA Technical Blog News and tutorials for developers, data scientists, and IT admins 2025-05-19T16:20:29Z http://www.open-lab.net/blog/feed/ Erez Tweg <![CDATA[NVIDIA ConnectX-8 SuperNICs Advance AI Platform Architecture with PCIe Gen6 Connectivity]]> http://www.open-lab.net/blog/?p=99991 2025-05-19T04:07:42Z 2025-05-19T04:07:33Z

As AI workloads grow in complexity and scale��from large language models (LLMs) to agentic AI reasoning and physical AI��the demand for faster, more scalable...]]>

As AI workloads grow in complexity and scale��from large language models (LLMs) to agentic AI reasoning and physical AI��the demand for faster, more scalable...

nvidia-connectx-supernic

As AI workloads grow in complexity and scale��from large language models (LLMs) to agentic AI reasoning and physical AI��the demand for faster, more scalable compute infrastructure has never been greater. Meeting these demands requires rethinking system architecture from the ground up. NVIDIA is advancing platform architecture with NVIDIA ConnectX-8 SuperNICs, the industry��s first SuperNIC to��

]]> 0 Chintan Patel <![CDATA[Build Agents and Understand Long Docs with Mistral Medium 3 and NVIDIA NIM]]> http://www.open-lab.net/blog/?p=99879 2025-05-16T15:48:30Z 2025-05-16T15:48:28Z

Developers building powerful multimodal applications now have a new state-of-the-art model designed for enterprise-scale performance with Mistral Medium 3....]]>

Developers building powerful multimodal applications now have a new state-of-the-art model designed for enterprise-scale performance with Mistral Medium 3.... An image representing LLM models.

An image representing LLM models.

Developers building powerful multimodal applications now have a new state-of-the-art model designed for enterprise-scale performance with Mistral Medium 3. Mistral Medium 3 combines high-performance, efficiency, and versatility in a compact deployment footprint. Designed for commercial and on-prem use cases, this dense model runs efficiently on NVIDIA Hopper GPUs��

]]> 0 Ivan Goldwasser <![CDATA[Building the Modular Foundation for AI Factories with NVIDIA MGX]]> http://www.open-lab.net/blog/?p=100010 2025-05-15T17:34:45Z 2025-05-16T15:00:00Z

The exponential growth of generative AI, large language models (LLMs), and high-performance computing has created unprecedented demands on data center...]]>

The exponential growth of generative AI, large language models (LLMs), and high-performance computing has created unprecedented demands on data center...

nvidia-mgx

The exponential growth of generative AI, large language models (LLMs), and high-performance computing has created unprecedented demands on data center infrastructure. Traditional server architectures struggle to accommodate the power density, thermal requirements, and rapid iteration cycles of modern accelerated computing. This post explains the benefits of NVIDIA MGX��

]]> 0 Vinh Nguyen <![CDATA[Build Custom Reasoning Models with Advanced, Open Post-Training Datasets]]> http://www.open-lab.net/blog/?p=98680 2025-05-15T19:07:23Z 2025-05-14T16:33:26Z

Synthetic data has become a standard part of large language model (LLM) post-training procedures. Using a large number of synthetically generated examples from...]]>

Synthetic data has become a standard part of large language model (LLM) post-training procedures. Using a large number of synthetically generated examples from... How the Llama-Nemotron 30M Post Training Dataset was created

How the Llama-Nemotron 30M Post Training Dataset was created

Synthetic data has become a standard part of large language model (LLM) post-training procedures. Using a large number of synthetically generated examples from either a single or cohort of open-source, commercially permissible LLMs, a base LLM is finetuned either with supervised finetuning or RLHF to gain instruction-following and reasoning skills. This process can be seen as a knowledge��

]]> 0 Alex Zeltov <![CDATA[Accelerated AI Inference with NVIDIA NIM on Azure AI Foundry]]> http://www.open-lab.net/blog/?p=99911 2025-05-15T19:07:29Z 2025-05-12T17:59:36Z

The integration of NVIDIA NIM microservices into Azure AI Foundry marks a major leap forward in enterprise AI development. By combining NIM microservices with...]]>

The integration of NVIDIA NIM microservices into Azure AI Foundry marks a major leap forward in enterprise AI development. By combining NIM microservices with...

nim-nemo-retriever

The integration of NVIDIA NIM microservices into Azure AI Foundry marks a major leap forward in enterprise AI development. By combining NIM microservices with Azure��s scalable, secure infrastructure, organizations can now deploy powerful, ready-to-use AI models more efficiently than ever before. NIM microservices are containerized for GPU-accelerated inferencing for pretrained and customized��

]]> 0 Shashank Verma <![CDATA[Run Hugging Face Models Instantly with Day-0 Support from NVIDIA NeMo Framework]]> http://www.open-lab.net/blog/?p=99933 2025-05-15T19:07:31Z 2025-05-12T17:48:24Z

As organizations strive to maximize the value of their generative AI investments, accessing the latest model developments is crucial to continued success. By...]]>

As organizations strive to maximize the value of their generative AI investments, accessing the latest model developments is crucial to continued success. By...

NeMo-AutoModel

As organizations strive to maximize the value of their generative AI investments, accessing the latest model developments is crucial to continued success. By using state-of-the-art models on Day-0, teams can harness these innovations efficiently, maintain relevance, and be competitive. The past year has seen a flurry of exciting model series releases in the open-source community��

]]> 0 Rucha Apte <![CDATA[Applying Specialized LLMs with Reasoning Capabilities to Accelerate Battery Research]]> http://www.open-lab.net/blog/?p=99794 2025-05-15T19:07:33Z 2025-05-09T16:00:00Z

Scientific research in complex fields like battery innovation is often slowed by manual evaluation of materials, limiting progress to just dozens of candidates...]]>

Scientific research in complex fields like battery innovation is often slowed by manual evaluation of materials, limiting progress to just dozens of candidates... An illustration showing molecules and a brain.

An illustration showing molecules and a brain.

Scientific research in complex fields like battery innovation is often slowed by manual evaluation of materials, limiting progress to just dozens of candidates per day. In this blog post, we explore how domain-adapted large language models (LLMs), enhanced with reasoning capabilities, are transforming scientific research, especially in high-stakes, complex domains like battery innovation.

]]> 0 Kyle Aubrey <![CDATA[Turbocharge LLM Training Across Long-Haul Data Center Networks with NVIDIA Nemo Framework]]> http://www.open-lab.net/blog/?p=99764 2025-05-15T19:07:37Z 2025-05-08T18:28:58Z

Multi-data center training is becoming essential for AI factories as pretraining scaling fuels the creation of even larger models, leading the demand for...]]>

Multi-data center training is becoming essential for AI factories as pretraining scaling fuels the creation of even larger models, leading the demand for... A multi-data center illustration.

A multi-data center illustration.

Multi-data center training is becoming essential for AI factories as pretraining scaling fuels the creation of even larger models, leading the demand for computing performance to outpace the capabilities of a single facility. By distributing workloads across multiple data centers, organizations can overcome limitations in power, cooling, and space, enabling the training of even larger��

]]> 0 Rishi Chandra <![CDATA[Accelerate Deep Learning and LLM Inference with Apache Spark in the Cloud]]> http://www.open-lab.net/blog/?p=99585 2025-05-15T19:07:40Z 2025-05-08T16:00:00Z

Apache Spark is an industry-leading platform for big data processing and analytics. With the increasing prevalence of unstructured data��documents, emails,...]]>

Apache Spark is an industry-leading platform for big data processing and analytics. With the increasing prevalence of unstructured data��documents, emails,...

Accelerate Deep Learning and LLM Inference with Apache Spark in the Cloud

Apache Spark is an industry-leading platform for big data processing and analytics. With the increasing prevalence of unstructured data��documents, emails, multimedia content��deep learning (DL) and large language models (LLMs) have become core components of the modern data analytics pipeline. These models enable a variety of downstream tasks, such as image captioning, semantic tagging��

]]> 0 Nirmal Kumar Juluru <![CDATA[Building Nemotron-CC, A High-Quality Trillion Token Dataset for LLM Pretraining from Common Crawl Using NVIDIA NeMo Curator]]> http://www.open-lab.net/blog/?p=99540 2025-05-15T19:07:43Z 2025-05-07T16:22:31Z

Curating high-quality pretraining datasets is critical for enterprise developers aiming to train state-of-the-art large language models (LLMs). To enable...]]>

Curating high-quality pretraining datasets is critical for enterprise developers aiming to train state-of-the-art large language models (LLMs). To enable...

Building Nemotron-CC image copy

Curating high-quality pretraining datasets is critical for enterprise developers aiming to train state-of-the-art large language models (LLMs). To enable developers to build highly accurate LLMs, NVIDIA previously released Nemotron-CC, a 6.3-trillion-token English language Common Crawl (CC) dataset. Today, the NVIDIA NeMo Curator team is excited to share that the pipeline used to build the��

]]> 0 Vinh Nguyen <![CDATA[LLM Inference Benchmarking Guide: NVIDIA GenAI-Perf and NIM]]> http://www.open-lab.net/blog/?p=99180 2025-05-15T19:07:45Z 2025-05-06T17:35:39Z

This is the second post in the LLM Benchmarking series, which shows how to use GenAI-Perf to benchmark the Meta Llama 3 model when deployed with NVIDIA NIM.?...]]>

This is the second post in the LLM Benchmarking series, which shows how to use GenAI-Perf to benchmark the Meta Llama 3 model when deployed with NVIDIA NIM.?... Decorative image of a datacenter with floating icons overlaid.

Decorative image of a datacenter with floating icons overlaid.

This is the second post in the LLM Benchmarking series, which shows how to use GenAI-Perf to benchmark the Meta Llama 3 model when deployed with NVIDIA NIM. When building LLM-based applications, it is critical to understand the performance characteristics of these models on a given hardware. This serves multiple purposes: As a client-side LLM-focused benchmarking tool��

]]> 0 Ankit Patel <![CDATA[Integrate and Deploy Tongyi Qwen3 Models into Production Applications with NVIDIA]]> http://www.open-lab.net/blog/?p=99462 2025-05-15T19:07:50Z 2025-05-02T22:00:00Z

Alibaba recently released Tongyi Qwen3, a family of open-source hybrid-reasoning large language models (LLMs). The Qwen3 family consists of two MoE models,...]]>

Alibaba recently released Tongyi Qwen3, a family of open-source hybrid-reasoning large language models (LLMs). The Qwen3 family consists of two MoE models,...

tensor-rt-llm-graphic

Alibaba recently released Tongyi Qwen3, a family of open-source hybrid-reasoning large language models (LLMs). The Qwen3 family consists of two MoE models, 235B-A22B (235B total parameters and 22B active parameters) and 30B-A3B, and six dense models, including the 0.6B, 1.7B, 4B, 8B, 14B, 32B versions. With ultra-fast token generation, developers can efficiently integrate and deploy Qwen3��

]]> 0 Babak Hejazi <![CDATA[Boosting Matrix Multiplication Speed and Flexibility with NVIDIA cuBLAS 12.9]]> http://www.open-lab.net/blog/?p=99184 2025-05-15T19:08:28Z 2025-05-01T20:00:00Z

The NVIDIA CUDA-X math libraries empower developers to build accelerated applications for AI, scientific computing, data processing, and more. Two...]]>

The NVIDIA CUDA-X math libraries empower developers to build accelerated applications for AI, scientific computing, data processing, and more. Two... An image representing matrix multiplication.

An image representing matrix multiplication.

The NVIDIA CUDA-X math libraries empower developers to build accelerated applications for AI, scientific computing, data processing, and more. Two of the most important applications of CUDA-X libraries are training and inference LLMs, whether for use in everyday consumer applications or highly specialized scientific domains like drug discovery. Multiple CUDA-X libraries are indispensable��

]]> 0 Joseph Lucas <![CDATA[Structuring Applications to Secure the KV Cache]]> http://www.open-lab.net/blog/?p=99425 2025-05-15T19:08:32Z 2025-04-29T22:43:01Z

When interacting with transformer-based models like large language models (LLMs) and vision-language models (VLMs), the structure of the input shapes the...]]>

When interacting with transformer-based models like large language models (LLMs) and vision-language models (VLMs), the structure of the input shapes the...

Structuring Applications to Secure the KV Cache blog

When interacting with transformer-based models like large language models (LLMs) and vision-language models (VLMs), the structure of the input shapes the model��s output. But prompts are often more than a simple user query. In practice, they optimize the response by dynamically assembling data from various sources such as system instructions, context data, and user input.

]]> 0 Hsin Chen <![CDATA[Advancing Cybersecurity Operations with Agentic AI Systems]]> http://www.open-lab.net/blog/?p=99329 2025-05-15T19:08:37Z 2025-04-28T19:52:53Z

The age of passive AI is over. A new era is beginning, where AI doesn��t just respond��it thinks, plans, and acts. The rapid advancement of large language...]]>

The age of passive AI is over. A new era is beginning, where AI doesn��t just respond��it thinks, plans, and acts. The rapid advancement of large language...

Advancing Cybersecurity Operations with Agentic AI Systems

The age of passive AI is over. A new era is beginning, where AI doesn��t just respond��it thinks, plans, and acts. The rapid advancement of large language models (LLMs) has unlocked the potential of agentic AI systems, enabling the automation of tedious tasks across many fields, including cybersecurity. Traditionally, AI applications in cybersecurity have focused primarily on detecting��

]]> 0 Davide Paglieri <![CDATA[Benchmarking Agentic LLM and VLM Reasoning for Gaming with NVIDIA NIM]]> http://www.open-lab.net/blog/?p=99202 2025-05-15T19:08:40Z 2025-04-24T17:00:00Z

This is the first post in the LLM Benchmarking series, which shows how to use GenAI-Perf to benchmark the Meta Llama 3 model when deployed with NVIDIA NIM.?...]]>

This is the first post in the LLM Benchmarking series, which shows how to use GenAI-Perf to benchmark the Meta Llama 3 model when deployed with NVIDIA NIM.?...

nvidia-nim-microservices

This is the first post in the LLM Benchmarking series, which shows how to use GenAI-Perf to benchmark the Meta Llama 3 model when deployed with NVIDIA NIM. Researchers from the University College London (UCL) Deciding, Acting, and Reasoning with Knowledge (DARK) Lab leverage NVIDIA NIM microservices in their new game-based benchmark suite, Benchmarking Agentic LLM and VLM Reasoning On Games��

]]> 0 Amit Bleiweiss <![CDATA[Spotlight: Qodo Innovates Efficient Code Search with NVIDIA DGX]]> http://www.open-lab.net/blog/?p=99041 2025-05-15T19:08:41Z 2025-04-23T22:23:32Z

Large language models (LLMs) have enabled AI tools that help you write more code faster, but as we ask these tools to take on more and more complex tasks, there...]]>

Large language models (LLMs) have enabled AI tools that help you write more code faster, but as we ask these tools to take on more and more complex tasks, there...

Decorative image.

Large language models (LLMs) have enabled AI tools that help you write more code faster, but as we ask these tools to take on more and more complex tasks, there are limitations that become apparent. Challenges such as understanding the nuances of programming languages, complex dependencies, and adapting to codebase-specific context can lead to lower-quality code and cause bottlenecks down the line.

]]> 0 Emily Sakata <![CDATA[Announcing NVIDIA Secure AI General Availability]]> http://www.open-lab.net/blog/?p=99064 2025-05-15T19:08:42Z 2025-04-23T22:23:11Z

As many enterprises move to running AI training or inference on their data, the data and the code need to be protected, especially for large language models...]]>

As many enterprises move to running AI training or inference on their data, the data and the code need to be protected, especially for large language models...

security-support-staff-monitoring-computer-security-threats

As many enterprises move to running AI training or inference on their data, the data and the code need to be protected, especially for large language models (LLMs). Many customers can��t risk placing their data in the cloud because of data sensitivity. Such data may contain personally identifiable information (PII) or company proprietary information, and the trained model has valuable intellectual��

]]> 0 Shashank Verma <![CDATA[Enhance Your AI Agent with Data Flywheels Using NVIDIA NeMo Microservices]]> http://www.open-lab.net/blog/?p=98721 2025-05-15T19:08:45Z 2025-04-23T13:00:00Z

Enterprise data is constantly changing. This presents significant challenges for maintaining AI system accuracy over time. As organizations increasingly rely on...]]>

Enterprise data is constantly changing. This presents significant challenges for maintaining AI system accuracy over time. As organizations increasingly rely on...

laptop-desk-typing

Enterprise data is constantly changing. This presents significant challenges for maintaining AI system accuracy over time. As organizations increasingly rely on agentic AI systems to optimize business processes, keeping these systems aligned with evolving business needs and new data becomes crucial. This post dives into how to build an iteration of a data flywheel using NVIDIA NeMo��

]]> 0 Daniel Rodriguez <![CDATA[Announcing ComputeEval, an Open-Source Framework for Evaluating LLMs on CUDA]]> http://www.open-lab.net/blog/?p=98885 2025-05-15T19:08:55Z 2025-04-16T16:48:07Z

Large language models (LLMs) are revolutionizing how developers code and how they learn to code. For seasoned or junior developers alike, today��s...]]>

Large language models (LLMs) are revolutionizing how developers code and how they learn to code. For seasoned or junior developers alike, today��s...

cuda-graphic

Large language models (LLMs) are revolutionizing how developers code and how they learn to code. For seasoned or junior developers alike, today��s state-of-the-art models can generate Python scripts, React-based websites, and more. In the future, powerful AI models will assist developers in writing high-performance GPU code. This raises an important question: How can it be determined whether an LLM��

]]> 0 Sebastian Haan <![CDATA[Developing an AI-Powered Tool for Automatic Citation Validation Using NVIDIA NIM]]> http://www.open-lab.net/blog/?p=98315 2025-05-15T19:08:56Z 2025-04-16T16:40:50Z

The accuracy of citations is crucial for maintaining the integrity of both academic and AI-generated content. When citations are inaccurate or wrong, they can...]]>

The accuracy of citations is crucial for maintaining the integrity of both academic and AI-generated content. When citations are inaccurate or wrong, they can...

cube-black-background

The accuracy of citations is crucial for maintaining the integrity of both academic and AI-generated content. When citations are inaccurate or wrong, they can mislead readers and spread false information. As a team of researchers from the University of Sydney specializing in machine learning and AI, we are developing an AI-powered tool capable of efficiently cross-checking and analyzing semantic��

]]> 0 Shai Shen-Orr <![CDATA[Curating Biological Findings from Scientific Literature with NVIDIA NIM]]> http://www.open-lab.net/blog/?p=98526 2025-04-28T23:18:36Z 2025-04-10T18:30:00Z

Scientific papers are highly heterogeneous, often employing diverse terminologies for the same entities, using varied methodologies to study biological...]]>

Scientific papers are highly heterogeneous, often employing diverse terminologies for the same entities, using varied methodologies to study biological...

ai-models-biology-medical-research

Scientific papers are highly heterogeneous, often employing diverse terminologies for the same entities, using varied methodologies to study biological phenomena, and presenting findings within distinct contexts. Extracting meaningful insights from these papers requires a profound understanding of biology, a critical evaluation of methodologies, and the ability to discern robust findings from��

]]> 0 Ashish Sardana <![CDATA[Prevent LLM Hallucinations with the Cleanlab Trustworthy Language Model in NVIDIA NeMo Guardrails]]> http://www.open-lab.net/blog/?p=98456 2025-04-22T23:39:03Z 2025-04-09T20:00:00Z

As more enterprises integrate LLMs into their applications, they face a critical challenge: LLMs can generate plausible but incorrect responses, known as...]]>

As more enterprises integrate LLMs into their applications, they face a critical challenge: LLMs can generate plausible but incorrect responses, known as...

cleanlab-nvidia-logos

As more enterprises integrate LLMs into their applications, they face a critical challenge: LLMs can generate plausible but incorrect responses, known as hallucinations. AI guardrails��or safeguarding mechanisms enforced in AI models and applications��are a popular technique to ensure the reliability of AI applications. This post demonstrates how to build safer��

]]> 0 Chris Alexiuk <![CDATA[Build Enterprise AI Agents with Advanced Open NVIDIA Llama Nemotron Reasoning Models]]> http://www.open-lab.net/blog/?p=97155 2025-05-05T16:01:49Z 2025-04-08T22:05:00Z

This updated post was originally published on March 18, 2025. Organizations are embracing AI agents to enhance productivity and streamline operations. To...]]>

This updated post was originally published on March 18, 2025. Organizations are embracing AI agents to enhance productivity and streamline operations. To...

This updated post was originally published on March 18, 2025. Organizations are embracing AI agents to enhance productivity and streamline operations. To maximize their impact, these agents need strong reasoning abilities to navigate complex problems, uncover hidden connections, and make logical decisions autonomously in dynamic environments. Due to their ability to tackle complex��

]]> 0 Ashraf Eassa <![CDATA[NVIDIA Blackwell Delivers Massive Performance Leaps in MLPerf Inference v5.0]]> http://www.open-lab.net/blog/?p=98367 2025-04-23T19:41:12Z 2025-04-02T18:14:48Z

The compute demands for large language model (LLM) inference are growing rapidly, fueled by the combination of growing model sizes, real-time latency...]]>

The compute demands for large language model (LLM) inference are growing rapidly, fueled by the combination of growing model sizes, real-time latency...

nvidia-blackwell

The compute demands for large language model (LLM) inference are growing rapidly, fueled by the combination of growing model sizes, real-time latency requirements, and, most recently, AI reasoning. At the same time, as AI adoption grows, the ability of an AI factory to serve as many users as possible, all while maintaining good per-user experiences, is key to maximizing the value it generates.

]]> 0 Vinh Nguyen <![CDATA[LLM Inference Benchmarking: Fundamental Concepts]]> http://www.open-lab.net/blog/?p=98215 2025-05-09T18:23:04Z 2025-04-02T17:00:00Z

This is the first post in the large language model latency-throughput benchmarking series, which aims to instruct developers on common metrics used for LLM...]]>

This is the first post in the large language model latency-throughput benchmarking series, which aims to instruct developers on common metrics used for LLM...

data-center

This is the first post in the large language model latency-throughput benchmarking series, which aims to instruct developers on common metrics used for LLM benchmarking, fundamental concepts, and how to benchmark your LLM applications. The past few years have witnessed the rise in popularity of generative AI and large language models (LLMs), as part of a broad AI revolution.

]]> 0 Arun Raman <![CDATA[Deploying the NVIDIA AI Blueprint for Cost-Efficient LLM Routing]]> http://www.open-lab.net/blog/?p=98006 2025-04-23T00:01:08Z 2025-03-26T22:01:20Z

Since the release of ChatGPT in November 2022, the capabilities of large language models (LLMs) have surged, and the number of available models has grown...]]>

Since the release of ChatGPT in November 2022, the capabilities of large language models (LLMs) have surged, and the number of available models has grown...

llm-routing

Since the release of ChatGPT in November 2022, the capabilities of large language models (LLMs) have surged, and the number of available models has grown exponentially. With this expansion, LLMs now vary widely in cost, performance, and specialization. For example, straightforward tasks like text summarization can be efficiently handled by smaller, general-purpose models. In contrast��

]]> 0 Uttara Kumar <![CDATA[Boost Llama Model Performance on Microsoft Azure AI Foundry with NVIDIA TensorRT-LLM]]> http://www.open-lab.net/blog/?p=97008 2025-04-23T00:07:01Z 2025-03-20T15:00:00Z

Microsoft, in collaboration with NVIDIA, announced transformative performance improvements for the Meta Llama family of models on its Azure AI Foundry platform....]]>

Microsoft, in collaboration with NVIDIA, announced transformative performance improvements for the Meta Llama family of models on its Azure AI Foundry platform....

llm-graphic

Microsoft, in collaboration with NVIDIA, announced transformative performance improvements for the Meta Llama family of models on its Azure AI Foundry platform. These advancements, enabled by NVIDIA TensorRT-LLM optimizations, deliver significant gains in throughput, reduced latency, and improved cost efficiency, all while preserving the quality of model outputs. With these improvements��

]]> 0 Phoebe Lee <![CDATA[NVIDIA Virtual GPU 18.0 Enables VDI for AI on Every Virtualized Platform]]> http://www.open-lab.net/blog/?p=97618 2025-04-23T00:07:56Z 2025-03-19T20:00:00Z

NVIDIA Virtual GPU (vGPU) technology unlocks AI capabilities within Virtual Desktop Infrastructure (VDI), making it more powerful and versatile than ever...]]>

NVIDIA Virtual GPU (vGPU) technology unlocks AI capabilities within Virtual Desktop Infrastructure (VDI), making it more powerful and versatile than ever...

working-on-laptop-at-desk

NVIDIA Virtual GPU (vGPU) technology unlocks AI capabilities within Virtual Desktop Infrastructure (VDI), making it more powerful and versatile than ever before. By powering AI-driven workloads across virtualized environments, vGPU boosts productivity, strengthens security, and optimizes performance. The latest software release empowers businesses and developers to push innovation further��

]]> 0 Vishal Ganeriwala <![CDATA[Seamlessly Scale AI Across Cloud Environments with NVIDIA DGX Cloud Serverless Inference]]> http://www.open-lab.net/blog/?p=97192 2025-03-20T17:07:54Z 2025-03-18T21:22:51Z

NVIDIA DGX Cloud Serverless Inference is an auto-scaling AI inference solution that enables application deployment with speed and reliability. Powered by NVIDIA...]]>

NVIDIA DGX Cloud Serverless Inference is an auto-scaling AI inference solution that enables application deployment with speed and reliability. Powered by NVIDIA...

dgx-cloud-serverless-inference

NVIDIA DGX Cloud Serverless Inference is an auto-scaling AI inference solution that enables application deployment with speed and reliability. Powered by NVIDIA Cloud Functions (NVCF), DGX Cloud Serverless Inference abstracts multi-cluster infrastructure setups across multi-cloud and on-premises environments for GPU-accelerated workloads. Whether managing AI workloads��

]]> 0 Emily Potyraj <![CDATA[Measure and Improve AI Workload Performance with NVIDIA DGX Cloud Benchmarking]]> http://www.open-lab.net/blog/?p=97548 2025-05-06T17:00:29Z 2025-03-18T21:21:17Z

As AI capabilities advance, understanding the impact of hardware and software infrastructure choices on workload performance is crucial for both technical...]]>

As AI capabilities advance, understanding the impact of hardware and software infrastructure choices on workload performance is crucial for both technical...

dgx-cloud-benchmark

As AI capabilities advance, understanding the impact of hardware and software infrastructure choices on workload performance is crucial for both technical validation and business planning. Organizations need a better way to assess real-world, end-to-end AI workload performance and the total cost of ownership rather than just comparing raw FLOPs or hourly cost per GPU.

]]> 0 Ruchika Kharwar <![CDATA[NVIDIA NeMo Retriever Delivers Accurate Multimodal PDF Data Extraction 15x Faster]]> http://www.open-lab.net/blog/?p=97161 2025-04-23T00:13:16Z 2025-03-18T19:20:51Z

Enterprises are generating and storing more multimodal data than ever before, yet traditional retrieval systems remain largely text-focused. While they can...]]>

Enterprises are generating and storing more multimodal data than ever before, yet traditional retrieval systems remain largely text-focused. While they can...

ai-model-representation

Enterprises are generating and storing more multimodal data than ever before, yet traditional retrieval systems remain largely text-focused. While they can surface insights from written content, they aren��t extracting critical information embedded in tables, charts, and infographics��often the most information-dense elements of a document. Without a multimodal retrieval system��

]]> 0 Christian Munley <![CDATA[Improve AI Code Generation Using NVIDIA Agent Intelligence Toolkit]]> http://www.open-lab.net/blog/?p=96937 2025-04-23T00:14:38Z 2025-03-18T19:07:50Z

With the release of NVIDIA Agent Intelligence toolkit��an open-source library for connecting and optimizing teams of AI agents��developers, professionals, and...]]>

With the release of NVIDIA Agent Intelligence toolkit��an open-source library for connecting and optimizing teams of AI agents��developers, professionals, and... An illustration for AgentIQ.

An illustration for AgentIQ.

With the release of NVIDIA Agent Intelligence toolkit��an open-source library for connecting and optimizing teams of AI agents��developers, professionals, and researchers can create their own agentic AI applications. This tutorial shows you how to develop apps in the Agent Intelligence toolkit through an example of AI code generation. We build a test-driven coding agent using LangGraph and reasoning��

]]> 1 Amr Elmeleegy <![CDATA[Introducing NVIDIA Dynamo, A Low-Latency Distributed Inference Framework for Scaling Reasoning AI Models]]> http://www.open-lab.net/blog/?p=95274 2025-04-23T00:15:55Z 2025-03-18T17:50:00Z

NVIDIA announced the release of NVIDIA Dynamo today at GTC 2025. NVIDIA Dynamo is a high-throughput, low-latency open-source inference serving framework for...]]>

NVIDIA announced the release of NVIDIA Dynamo today at GTC 2025. NVIDIA Dynamo is a high-throughput, low-latency open-source inference serving framework for...

computer-monitor-data-center-abstract

NVIDIA announced the release of NVIDIA Dynamo today at GTC 2025. NVIDIA Dynamo is a high-throughput, low-latency open-source inference serving framework for deploying generative AI and reasoning models in large-scale distributed environments. The framework boosts the number of requests served by up to 30x, when running the open-source DeepSeek-R1 models on NVIDIA Blackwell.

]]> 1 Ashraf Eassa <![CDATA[NVIDIA Blackwell Delivers World-Record DeepSeek-R1 Inference Performance]]> http://www.open-lab.net/blog/?p=97352 2025-04-23T00:23:25Z 2025-03-18T17:41:42Z

NVIDIA announced world-record DeepSeek-R1 inference performance at NVIDIA GTC 2025. A single NVIDIA DGX system with eight NVIDIA Blackwell GPUs can achieve over...]]>

NVIDIA announced world-record DeepSeek-R1 inference performance at NVIDIA GTC 2025. A single NVIDIA DGX system with eight NVIDIA Blackwell GPUs can achieve over...

inference-world-record-nvidia-blackwell

NVIDIA announced world-record DeepSeek-R1 inference performance at NVIDIA GTC 2025. A single NVIDIA DGX system with eight NVIDIA Blackwell GPUs can achieve over 250 tokens per second per user or a maximum throughput of over 30,000 tokens per second on the massive, state-of-the-art 671 billion parameter DeepSeek-R1 model. These rapid advancements in performance at both ends of the performance��

]]> 1 Chen Fu <![CDATA[Streamline LLM Deployment for Autonomous Vehicle Applications with NVIDIA DriveOS LLM SDK]]> http://www.open-lab.net/blog/?p=96776 2025-03-07T20:13:46Z 2025-03-10T19:30:00Z

Large language models (LLMs) have shown remarkable generalization capabilities in natural language processing (NLP). They are used in a wide range of...]]>

Large language models (LLMs) have shown remarkable generalization capabilities in natural language processing (NLP). They are used in a wide range of...

intersection-with-cars

Large language models (LLMs) have shown remarkable generalization capabilities in natural language processing (NLP). They are used in a wide range of applications, including translation, digital assistants, recommendation systems, context analysis, code generation, cybersecurity, and more. In automotive applications, there is growing demand for LLM-based solutions for both autonomous driving and��

]]> 2 Tanay Varshney <![CDATA[How Using a Reranking Microservice Can Improve Accuracy and Costs of Information Retrieval]]> http://www.open-lab.net/blog/?p=96363 2025-03-06T20:05:47Z 2025-03-06T18:33:38Z

Applications requiring high-performance information retrieval span a wide range of domains, including search engines, knowledge management systems, AI agents,...]]>

Applications requiring high-performance information retrieval span a wide range of domains, including search engines, knowledge management systems, AI agents,...

ai-model-representation

Applications requiring high-performance information retrieval span a wide range of domains, including search engines, knowledge management systems, AI agents, and AI assistants. These systems demand retrieval processes that are accurate and computationally efficient to deliver precise insights, enhance user experiences, and maintain scalability. Retrieval-augmented generation (RAG) is used to��

]]> 0 Sangjune Park <![CDATA[Spotlight: NAVER Place Optimizes SLM-Based Vertical Services with NVIDIA TensorRT-LLM]]> http://www.open-lab.net/blog/?p=96279 2025-04-23T02:32:43Z 2025-02-28T17:57:49Z

NAVER is a popular South Korean search engine company that offers Naver Place, a geo-based service that provides detailed information about millions of...]]>

NAVER is a popular South Korean search engine company that offers Naver Place, a geo-based service that provides detailed information about millions of...

naver-place-app-graphic.

As of 3/18/25, NVIDIA Triton Inference Server is now NVIDIA Dynamo. NAVER is a popular South Korean search engine company that offers Naver Place, a geo-based service that provides detailed information about millions of businesses and points of interest across Korea. Users can search about different places, leave reviews, and place bookings or orders in real time.

]]> 0 Francesco Ciannella <![CDATA[Building a Simple VLM-Based Multimodal Information Retrieval System with NVIDIA NIM]]> http://www.open-lab.net/blog/?p=96151 2025-03-06T19:26:45Z 2025-02-26T17:00:00Z

In today��s data-driven world, the ability to retrieve accurate information from even modest amounts of data is vital for developers seeking streamlined,...]]>

In today��s data-driven world, the ability to retrieve accurate information from even modest amounts of data is vital for developers seeking streamlined,... Three icons leading to a computer monitor.

Three icons leading to a computer monitor.

In today��s data-driven world, the ability to retrieve accurate information from even modest amounts of data is vital for developers seeking streamlined, effective solutions for quick deployments, prototyping, or experimentation. One of the key challenges in information retrieval is managing the diverse modalities in unstructured datasets, including text, PDFs, images, tables, audio, video��

]]> 1 Yifan Wu <![CDATA[Accelerating Scientific Literature Reviews with NVIDIA NIM Microservices for LLMs]]> http://www.open-lab.net/blog/?p=96324 2025-04-23T02:38:59Z 2025-02-26T17:00:00Z

A well-crafted systematic review is often the initial step for researchers exploring a scientific field. For scientists new to this field, it provides a...]]>

A well-crafted systematic review is often the initial step for researchers exploring a scientific field. For scientists new to this field, it provides a... A picture of a penguin next to an open book.

A picture of a penguin next to an open book.

A well-crafted systematic review is often the initial step for researchers exploring a scientific field. For scientists new to this field, it provides a structured overview of the domain. For experts, it refines their understanding and sparks new ideas. In 2024 alone, 218,650 review articles were indexed in the Web of Science database, highlighting the importance of these resources in research.

]]> 0 Mark Ren <![CDATA[Configurable Graph-Based Task Solving with the Marco Multi-AI Agent Framework for Chip Design]]> http://www.open-lab.net/blog/?p=96209 2025-04-23T02:38:21Z 2025-02-25T22:17:28Z

Chip and hardware design presents numerous challenges stemming from its complexity and advancing technologies. These challenges result in longer turn-around...]]>

Chip and hardware design presents numerous challenges stemming from its complexity and advancing technologies. These challenges result in longer turn-around... A picture of a computer chip.

A picture of a computer chip.

Chip and hardware design presents numerous challenges stemming from its complexity and advancing technologies. These challenges result in longer turn-around time (TAT) for optimizing performance, power, area, and cost (PPAC) during synthesis, verification, physical design, and reliability loops. Large language models (LLMs) have shown a remarkable capacity to comprehend and generate natural��

]]> 0 Anjali Shah <![CDATA[Optimizing Qwen2.5-Coder Throughput with NVIDIA TensorRT-LLM Lookahead Decoding]]> http://www.open-lab.net/blog/?p=96010 2025-04-23T02:44:36Z 2025-02-14T18:19:37Z

Large language models (LLMs) that specialize in coding have been steadily adopted into developer workflows. From pair programming to self-improving AI agents,...]]>

Large language models (LLMs) that specialize in coding have been steadily adopted into developer workflows. From pair programming to self-improving AI agents,...

computer-screen-abstract

Large language models (LLMs) that specialize in coding have been steadily adopted into developer workflows. From pair programming to self-improving AI agents, these models assist developers with various tasks, including enhancing code, fixing bugs, generating tests, and writing documentation. To promote the development of open-source LLMs, the Qwen team recently released Qwen2.5-Coder��

]]> 1 Emily Potyraj <![CDATA[NVIDIA DGX Cloud Introduces Ready-To-Use Templates to Benchmark AI Platform Performance]]> http://www.open-lab.net/blog/?p=95558 2025-05-06T17:01:29Z 2025-02-11T17:00:00Z

In the rapidly evolving landscape of AI systems and workloads, achieving optimal model training performance extends far beyond chip speed. It requires a...]]>

In the rapidly evolving landscape of AI systems and workloads, achieving optimal model training performance extends far beyond chip speed. It requires a... Three icons in a row, including DGX in the middle.

Three icons in a row, including DGX in the middle.

In the rapidly evolving landscape of AI systems and workloads, achieving optimal model training performance extends far beyond chip speed. It requires a comprehensive evaluation of the entire stack, from compute to networking to model framework. Navigating the complexities of AI system performance can be difficult. There are many application changes that you can make��

]]> 0 Cheng-Han (Hank) Du <![CDATA[Improving Translation Quality with Domain-Specific Fine-Tuning and NVIDIA NIM]]> http://www.open-lab.net/blog/?p=95756 2025-04-23T02:50:50Z 2025-02-05T21:30:00Z

Translation plays an essential role in enabling companies to expand across borders, with requirements varying significantly in terms of tone, accuracy, and...]]>

Translation plays an essential role in enabling companies to expand across borders, with requirements varying significantly in terms of tone, accuracy, and...

ai-models-representation

Translation plays an essential role in enabling companies to expand across borders, with requirements varying significantly in terms of tone, accuracy, and technical terminology handling. The emergence of sovereign AI has highlighted critical challenges in large language models (LLMs), particularly their struggle to capture nuanced cultural and linguistic contexts beyond English-dominant��

]]> 1 Taylor Allison <![CDATA[Accelerating AI Storage by up to 48% with NVIDIA Spectrum-X Networking Platform and Partners]]> http://www.open-lab.net/blog/?p=95432 2025-04-23T02:48:15Z 2025-02-04T15:00:00Z

AI factories rely on more than just compute fabrics. While the East-West network connecting the GPUs is critical to AI application performance, the storage...]]>

AI factories rely on more than just compute fabrics. While the East-West network connecting the GPUs is critical to AI application performance, the storage...

data-center

AI factories rely on more than just compute fabrics. While the East-West network connecting the GPUs is critical to AI application performance, the storage fabric��connecting high-speed storage arrays��is equally important. Storage performance plays a key role across several stages of the AI lifecycle, including training checkpointing, inference techniques such as retrieval-augmented generation��

]]> 0 Isabel Hulseman <![CDATA[New NVIDIA AI Blueprint: Build a Customizable RAG Pipeline]]> http://www.open-lab.net/blog/?p=95614 2025-02-13T20:44:16Z 2025-01-30T22:26:12Z

Connect AI applications to enterprise data using embedding and reranking models for information retrieval.]]>

Connect AI applications to enterprise data using embedding and reranking models for information retrieval.

New NVIDIA AI Blueprint: Build Your First RAG Pipeline

Connect AI applications to enterprise data using embedding and reranking models for information retrieval.

]]> 0 Edoardo Maria Ponti <![CDATA[Dynamic Memory Compression]]> http://www.open-lab.net/blog/?p=93500 2025-04-23T15:01:58Z 2025-01-24T17:43:42Z

Despite the success of large language models (LLMs) as general-purpose AI tools, their high demand for computational resources make their deployment challenging...]]>

Despite the success of large language models (LLMs) as general-purpose AI tools, their high demand for computational resources make their deployment challenging... Three icons, with text LLMs, Optimize, Deploy.

Three icons, with text LLMs, Optimize, Deploy.

Despite the success of large language models (LLMs) as general-purpose AI tools, their high demand for computational resources make their deployment challenging in many real-world scenarios. The sizes of the model and conversation state are limited by the available high-bandwidth memory, limiting the number of users that can be served and the maximum conversation length. At present��

]]> 0 Nick Comly <![CDATA[Optimize AI Inference Performance with NVIDIA Full-Stack Solutions]]> http://www.open-lab.net/blog/?p=95310 2025-04-23T15:02:06Z 2025-01-24T16:00:00Z

The explosion of AI-driven applications has placed unprecedented demands on both developers, who must balance delivering cutting-edge performance with managing...]]>

The explosion of AI-driven applications has placed unprecedented demands on both developers, who must balance delivering cutting-edge performance with managing...

ai-model-composite-graphic.

As of 3/18/25, NVIDIA Triton Inference Server is now NVIDIA Dynamo. The explosion of AI-driven applications has placed unprecedented demands on both developers, who must balance delivering cutting-edge performance with managing operational complexity and cost, and AI infrastructure. NVIDIA is empowering developers with full-stack innovations��spanning chips, systems��

]]> 0 Juana Nakfour <![CDATA[Horizontal Autoscaling of NVIDIA NIM Microservices on Kubernetes]]> http://www.open-lab.net/blog/?p=94972 2025-04-23T15:02:12Z 2025-01-22T17:34:51Z

NVIDIA NIM microservices are model inference containers that can be deployed on Kubernetes. In a production environment, it��s important to understand the...]]>

NVIDIA NIM microservices are model inference containers that can be deployed on Kubernetes. In a production environment, it��s important to understand the... Decorative image of two cartoon llamas in sunglasses.

Decorative image of two cartoon llamas in sunglasses.

As of 3/18/25, NVIDIA Triton Inference Server is now NVIDIA Dynamo. NVIDIA NIM microservices are model inference containers that can be deployed on Kubernetes. In a production environment, it��s important to understand the compute and memory profile of these microservices to set up a successful autoscaling plan. In this post, we describe how to set up and use Kubernetes Horizontal Pod��

]]> 2 John Thomson <![CDATA[Introducing New KV Cache Reuse Optimizations in NVIDIA TensorRT-LLM]]> http://www.open-lab.net/blog/?p=95040 2025-04-23T15:02:57Z 2025-01-16T22:57:30Z

Language models generate text by predicting the next token, given all the previous tokens including the input text tokens. Key and value elements of the...]]>

Language models generate text by predicting the next token, given all the previous tokens including the input text tokens. Key and value elements of the...

weather-data-representation

Language models generate text by predicting the next token, given all the previous tokens including the input text tokens. Key and value elements of the previous tokens are used as historical context in LLM serving for generation of the next set of tokens. Caching these key and value elements from previous tokens avoids expensive recomputation and effectively leads to higher throughput. However��

]]> 0 Shashank Maheshwari <![CDATA[NVIDIA JetPack 6.2 Brings Super Mode to NVIDIA Jetson Orin Nano and Jetson Orin NX Modules]]> http://www.open-lab.net/blog/?p=95089 2025-04-23T15:03:02Z 2025-01-16T22:10:29Z

The introduction of the NVIDIA Jetson Orin Nano Super Developer Kit sparked a new age of generative AI for small edge devices. The new Super Mode delivered an...]]>

The introduction of the NVIDIA Jetson Orin Nano Super Developer Kit sparked a new age of generative AI for small edge devices. The new Super Mode delivered an... Stylized image of JetPack connected to a monitor.

Stylized image of JetPack connected to a monitor.

The introduction of the NVIDIA Jetson Orin Nano Super Developer Kit sparked a new age of generative AI for small edge devices. The new Super Mode delivered an unprecedented generative AI performance boost of up to 1.7x on the developer kit, making it the most affordable generative AI supercomputer. JetPack 6.2 is now available to support Super Mode for Jetson Orin Nano and Jetson Orin NX��

]]> 0 Aditi Bodhankar <![CDATA[How to Safeguard AI Agents for Customer Service with NVIDIA NeMo Guardrails]]> http://www.open-lab.net/blog/?p=94928 2025-02-04T19:53:15Z 2025-01-16T14:00:00Z

AI agents present a significant opportunity for businesses to scale and elevate customer service and support interactions. By automating routine inquiries and...]]>

AI agents present a significant opportunity for businesses to scale and elevate customer service and support interactions. By automating routine inquiries and...

virtual-assistant-chat-graphic

AI agents present a significant opportunity for businesses to scale and elevate customer service and support interactions. By automating routine inquiries and enhancing response times, these agents improve efficiency and customer satisfaction, helping organizations stay competitive. However, alongside these benefits, AI agents come with risks. Large language models (LLMs) are vulnerable to��

]]> 0 Dan Su <![CDATA[Announcing Nemotron-CC: A Trillion-Token English Language Dataset for LLM Pretraining]]> http://www.open-lab.net/blog/?p=94818 2025-01-23T19:54:30Z 2025-01-09T19:20:16Z

NVIDIA is excited to announce the release of Nemotron-CC, a 6.3-trillion-token English language Common Crawl dataset for pretraining highly accurate large...]]>

NVIDIA is excited to announce the release of Nemotron-CC, a 6.3-trillion-token English language Common Crawl dataset for pretraining highly accurate large...

llm-graphic

NVIDIA is excited to announce the release of Nemotron-CC, a 6.3-trillion-token English language Common Crawl dataset for pretraining highly accurate large language models (LLMs), including 1.9 trillion tokens of synthetically generated data. One of the keys to training state-of-the-art LLMs is a high-quality pretraining dataset, and recent top LLMs, such as the Meta Llama series��

]]> 0 Samuel Ochoa <![CDATA[Build a Video Search and Summarization Agent with NVIDIA AI Blueprint]]> http://www.open-lab.net/blog/?p=86011 2025-02-13T20:44:57Z 2025-01-07T04:20:00Z

This post was originally published July 29, 2024 but has been extensively revised with NVIDIA AI Blueprint information. Traditional video analytics applications...]]>

This post was originally published July 29, 2024 but has been extensively revised with NVIDIA AI Blueprint information. Traditional video analytics applications... Decorative image of icons and a molecular structure in green.

Decorative image of icons and a molecular structure in green.

This post was originally published July 29, 2024 but has been extensively revised with NVIDIA AI Blueprint information. Traditional video analytics applications and their development workflow are typically built on fixed-function, limited models that are designed to detect and identify only a select set of predefined objects. With generative AI, NVIDIA NIM microservices��

]]> 2 Tom Balough <![CDATA[Enhance Your Training Data with New NVIDIA NeMo Curator Classifier Models]]> http://www.open-lab.net/blog/?p=94447 2024-12-19T23:08:12Z 2024-12-19T23:08:08Z

Classifier models are specialized in categorizing data into predefined groups or classes, playing a crucial role in optimizing data processing pipelines for...]]>

Classifier models are specialized in categorizing data into predefined groups or classes, playing a crucial role in optimizing data processing pipelines for...

ai-model-graphic

Classifier models are specialized in categorizing data into predefined groups or classes, playing a crucial role in optimizing data processing pipelines for fine-tuning and pretraining generative AI models. Their value lies in enhancing data quality by filtering out low-quality or toxic data, ensuring only clean and relevant information feeds downstream processes. Beyond filtering��

]]> 0 Sama Bali <![CDATA[A Guide to Retrieval-Augmented Generation for AEC]]> http://www.open-lab.net/blog/?p=94305 2024-12-18T17:58:35Z 2024-12-18T21:00:00Z

Large language models (LLMs) are rapidly changing the business landscape, offering new capabilities in natural language processing (NLP), content generation,...]]>

Large language models (LLMs) are rapidly changing the business landscape, offering new capabilities in natural language processing (NLP), content generation,...

retrieval-augmented-generation-graphic

Large language models (LLMs) are rapidly changing the business landscape, offering new capabilities in natural language processing (NLP), content generation, and data analysis. These AI-powered tools have improved how companies operate, from streamlining customer service to enhancing decision-making processes. However, despite their impressive general knowledge, LLMs often struggle with��

]]> 1 Rakib Hasan <![CDATA[NVIDIA TensorRT-LLM Now Supports Recurrent Drafting for Optimizing LLM Inference]]> http://www.open-lab.net/blog/?p=92963 2025-03-11T01:44:00Z 2024-12-18T17:31:01Z

Recurrent drafting (referred to as ReDrafter) is a novel speculative decoding technique developed and open-sourced by Apple for large language model (LLM)...]]>

Recurrent drafting (referred to as ReDrafter) is a novel speculative decoding technique developed and open-sourced by Apple for large language model (LLM)...

tensor-rt-llm-graphic

Recurrent drafting (referred to as ReDrafter) is a novel speculative decoding technique developed and open-sourced by Apple for large language model (LLM) inference now available with NVIDIA TensorRT-LLM. ReDrafter helps developers significantly boost LLM workload performance on NVIDIA GPUs. NVIDIA TensorRT-LLM is a library for optimizing LLM inference. It provides an easy-to-use Python API to��

]]> 0 Japinder Singh <![CDATA[Fine-Tuning Small Language Models to Optimize Code Review Accuracy]]> http://www.open-lab.net/blog/?p=94078 2025-02-17T05:13:45Z 2024-12-17T17:58:31Z

Generative AI is transforming enterprises by driving innovation and boosting efficiency across numerous applications. However, adopting large foundational...]]>

Generative AI is transforming enterprises by driving innovation and boosting efficiency across numerous applications. However, adopting large foundational...

language-model-representation

]]> 0 Anjali Shah <![CDATA[Boost Llama 3.3 70B Inference Throughput 3x with NVIDIA TensorRT-LLM Speculative Decoding]]> http://www.open-lab.net/blog/?p=94146 2024-12-19T23:03:40Z 2024-12-17T17:00:00Z

Meta's Llama collection of open large language models (LLMs) continues to grow with the recent addition of Llama 3.3 70B, a text-only...]]>

Meta's Llama collection of open large language models (LLMs) continues to grow with the recent addition of Llama 3.3 70B, a text-only...

three-llamas-wearing-goggles

Meta��s Llama collection of open large language models (LLMs) continues to grow with the recent addition of Llama 3.3 70B, a text-only instruction-tuned model. Llama 3.3 provides enhanced performance respective to the older Llama 3.1 70B model and can even match the capabilities of the larger, more computationally expensive Llama 3.1 405B model on several tasks including math, reasoning, coding��

]]> 2 Suhas Hariharapura Sheshadri https://www.linkedin.com/in/suhassheshadri/ <![CDATA[NVIDIA Jetson Orin Nano Developer Kit Gets a ��Super�� Boost]]> http://www.open-lab.net/blog/?p=93942 2024-12-20T02:17:32Z 2024-12-17T14:00:00Z

The generative AI landscape is rapidly evolving, with new large language models (LLMs), visual language models (VLMs), and vision language action (VLA) models...]]>

The generative AI landscape is rapidly evolving, with new large language models (LLMs), visual language models (VLMs), and vision language action (VLA) models...

jetson-orin-tech-blog-nano-super-jhh-special-1920x1080

The generative AI landscape is rapidly evolving, with new large language models (LLMs), visual language models (VLMs), and vision language action (VLA) models emerging daily. To stay at the forefront of this transformative era, developers need a platform powerful enough to seamlessly deploy the latest models from the cloud to the edge with optimized inferencing and open ML frameworks using CUDA.

]]> 1 Joseph Lucas <![CDATA[Sandboxing Agentic AI Workflows with WebAssembly]]> http://www.open-lab.net/blog/?p=93975 2024-12-16T21:06:56Z 2024-12-16T20:33:46Z

Agentic AI workflows often involve the execution of large language model (LLM)-generated code to perform tasks like creating data visualizations. However, this...]]>

Agentic AI workflows often involve the execution of large language model (LLM)-generated code to perform tasks like creating data visualizations. However, this...

cybersecurity-graphic

Agentic AI workflows often involve the execution of large language model (LLM)-generated code to perform tasks like creating data visualizations. However, this code should be sanitized and executed in a safe environment to mitigate risks from prompt injection and errors in the returned code. Sanitizing Python with regular expressions and restricted runtimes is insufficient��

]]> 0 Michelle Horton <![CDATA[Top Posts of 2024 Highlight NVIDIA NIM, LLM Breakthroughs, and Data Science Optimization]]> http://www.open-lab.net/blog/?p=93566 2024-12-16T18:34:16Z 2024-12-16T18:34:14Z

2024 was another landmark year for developers, researchers, and innovators working with NVIDIA technologies. From groundbreaking developments in AI inference to...]]>

2024 was another landmark year for developers, researchers, and innovators working with NVIDIA technologies. From groundbreaking developments in AI inference to...

Technical-Blog-2024-Top-Posts

2024 was another landmark year for developers, researchers, and innovators working with NVIDIA technologies. From groundbreaking developments in AI inference to empowering open-source contributions, these blog posts highlight the breakthroughs that resonated most with our readers. NVIDIA NIM Offers Optimized Inference Microservices for Deploying AI Models at Scale Introduced in��

]]> 0 Tanay Varshney <![CDATA[An Easy Introduction to Multimodal Retrieval-Augmented Generation for Video and Audio]]> http://www.open-lab.net/blog/?p=93893 2024-12-16T21:53:48Z 2024-12-16T17:00:00Z

Building a multimodal retrieval-augmented generation (RAG) system is challenging. The difficulty comes from capturing and indexing information from across...]]>

Building a multimodal retrieval-augmented generation (RAG) system is challenging. The difficulty comes from capturing and indexing information from across...

computer-monitor-video-audio-icons

Building a multimodal retrieval-augmented generation (RAG) system is challenging. The difficulty comes from capturing and indexing information from across multiple modalities, including text, images, tables, audio, video, and more. In our previous post, An Easy Introduction to Multimodal Retrieval-Augmented Generation, we discussed how to tackle text and images. This post extends this conversation��

]]> 0 Isabel Hulseman <![CDATA[Three Building Blocks for Creating AI Virtual Assistants for Customer Service with an NVIDIA AI Blueprint]]> http://www.open-lab.net/blog/?p=90672 2024-12-12T19:35:14Z 2024-12-11T23:49:16Z

In today's fast-paced business environment, providing exceptional customer service is no longer just a nice-to-have��it's a necessity. Whether addressing...]]>

In today's fast-paced business environment, providing exceptional customer service is no longer just a nice-to-have��it's a necessity. Whether addressing...

ai-virtual-assistant-graphic

In today��s fast-paced business environment, providing exceptional customer service is no longer just a nice-to-have��it��s a necessity. Whether addressing technical issues, resolving billing questions, or providing service updates, customers expect quick, accurate, and personalized responses at their convenience. However, achieving this level of service comes with significant challenges.

]]> 0 Amr Elmeleegy <![CDATA[Spotlight: Perplexity AI Serves 400 Million Search Queries a Month Using NVIDIA Inference Stack]]> http://www.open-lab.net/blog/?p=93396 2025-03-18T18:26:38Z 2024-12-05T17:58:43Z

The demand for AI-enabled services continues to grow rapidly, placing increasing pressure on IT and infrastructure teams. These teams are tasked with...]]>

The demand for AI-enabled services continues to grow rapidly, placing increasing pressure on IT and infrastructure teams. These teams are tasked with...

inference-perplexity-ai

As of 3/18/25, NVIDIA Triton Inference Server is now NVIDIA Dynamo. The demand for AI-enabled services continues to grow rapidly, placing increasing pressure on IT and infrastructure teams. These teams are tasked with provisioning the necessary hardware and software to meet that demand while simultaneously balancing cost efficiency with optimal user experience. This challenge was faced by the��

]]> 0 Shubham Agrawal <![CDATA[Build an Agentic Video Workflow with Video Search and Summarization]]> http://www.open-lab.net/blog/?p=92834 2025-01-07T05:45:50Z 2024-12-03T18:30:00Z

Building a question-answering chatbot with large language models (LLMs) is now a common workflow for text-based interactions. What about creating an AI system...]]>

Building a question-answering chatbot with large language models (LLMs) is now a common workflow for text-based interactions. What about creating an AI system... An avatar sitting at a computer, which is linked to multiple action icons through the NVIDIA NIM icon.

Building a question-answering chatbot with large language models (LLMs) is now a common workflow for text-based interactions. What about creating an AI system that can answer questions about video and image content? This presents a far more complex task. Traditional video analytics tools struggle due to their limited functionality and a narrow focus on predefined objects.

]]> 0 Manoj C R <![CDATA[Spotlight: TCS Increases Automotive Software Testing Speeds by 2x Using NVIDIA Generative AI]]> http://www.open-lab.net/blog/?p=92444 2024-12-12T19:38:36Z 2024-11-22T20:07:53Z

Generative AI is transforming every aspect of the automotive industry, including software development, testing, user experience, personalization, and safety....]]>

Generative AI is transforming every aspect of the automotive industry, including software development, testing, user experience, personalization, and safety....

highway-traffic

Generative AI is transforming every aspect of the automotive industry, including software development, testing, user experience, personalization, and safety. With the automotive industry shifting from a mechanically driven approach to a software-driven one, generative AI is unlocking a world of possibilities. Tata Consultancy Services (TCS) focuses on two major segments for leveraging��

]]> 0 Xin Dong <![CDATA[Hymba Hybrid-Head Architecture Boosts Small Language Model Performance]]> http://www.open-lab.net/blog/?p=92595 2024-12-12T19:38:36Z 2024-11-22T17:31:14Z

Transformers, with their attention-based architecture, have become the dominant choice for language models (LMs) due to their strong performance,...]]>

Transformers, with their attention-based architecture, have become the dominant choice for language models (LMs) due to their strong performance,...

llm-graphic (1)

Transformers, with their attention-based architecture, have become the dominant choice for language models (LMs) due to their strong performance, parallelization capabilities, and long-term recall through key-value (KV) caches. However, their quadratic computational cost and high memory demands pose efficiency challenges. In contrast, state space models (SSMs) like Mamba and Mamba-2 offer constant��

]]> 0 Zenodia Charpy <![CDATA[Build Your First Human-in-the-Loop AI Agent with NVIDIA NIM]]> http://www.open-lab.net/blog/?p=91339 2024-12-12T19:38:38Z 2024-11-21T22:45:13Z

AI agents powered by large language models (LLMs) help organizations streamline and reduce manual workloads. These agents use multilevel, iterative reasoning to...]]>

AI agents powered by large language models (LLMs) help organizations streamline and reduce manual workloads. These agents use multilevel, iterative reasoning to...

desktop-computer-text-bubbles-onscreen

AI agents powered by large language models (LLMs) help organizations streamline and reduce manual workloads. These agents use multilevel, iterative reasoning to analyze problems, devise solutions, and execute tasks with various tools. Unlike traditional chatbots, LLM-powered agents automate complex tasks by effectively understanding and processing information. To avoid potential risks in specific��

]]> 20 Pralaypati Ta <![CDATA[Advancing Neuroscience Research with Visual Question Answering and Multimodal Retrieval]]> http://www.open-lab.net/blog/?p=90772 2024-11-20T00:26:26Z 2024-11-20T21:30:00Z

Leading healthcare organizations are turning to generative AI to help build applications that can deliver life-saving impacts. These organizations include the...]]>

Leading healthcare organizations are turning to generative AI to help build applications that can deliver life-saving impacts. These organizations include the...

cross-section

Leading healthcare organizations are turning to generative AI to help build applications that can deliver life-saving impacts. These organizations include the Indian Institute of Technology Madras �C IIT Madras Brain Centre. Advancing neuroscience research, the IIT Madras Brain Centre is using AI to generate analyses of whole human brains at a cellular level across various demographics.

]]> 0 Hoang Nguyen <![CDATA[Processing High-Quality Vietnamese Language Data with NVIDIA NeMo Curator]]> http://www.open-lab.net/blog/?p=92268 2024-12-20T18:38:19Z 2024-11-19T21:04:13Z

Open-source large language models (LLMs) excel in English but struggle with other languages, especially the languages of Southeast Asia. This is primarily due...]]>

Open-source large language models (LLMs) excel in English but struggle with other languages, especially the languages of Southeast Asia. This is primarily due... The process of data curation for LLMs.

The process of data curation for LLMs.

Open-source large language models (LLMs) excel in English but struggle with other languages, especially the languages of Southeast Asia. This is primarily due to a lack of training data in these languages, limited understanding of local cultures, and insufficient tokens to capture unique linguistic structures and expressions. To fully meet customer needs, enterprises in non-English-speaking��

]]> 0 Xhoni Shollaj <![CDATA[Create a Custom Slackbot LLM Agent with NVIDIA NIM and LangChain]]> http://www.open-lab.net/blog/?p=89825 2025-02-17T05:12:38Z 2024-11-19T17:00:00Z

In the dynamic world of modern business, where communication and efficient workflows are crucial for success, AI-powered solutions have become a competitive...]]>

In the dynamic world of modern business, where communication and efficient workflows are crucial for success, AI-powered solutions have become a competitive... Chatbot avatar in front of a stylized chat screen on a purple background.

In the dynamic world of modern business, where communication and efficient workflows are crucial for success, AI-powered solutions have become a competitive advantage. AI agents, built on cutting-edge large language models (LLMs) and powered by NVIDIA NIM provide a seamless way to enhance productivity and information flow. NIM, part of NVIDIA AI Enterprise, is a suite of easy-to-use��

]]> 1 Bethann Noble <![CDATA[NVIDIA NIM 1.4 Ready to Deploy with 2.4x Faster Inference]]> http://www.open-lab.net/blog/?p=92172 2024-11-20T04:40:21Z 2024-11-16T00:41:54Z

The demand for ready-to-deploy high-performance inference is growing as generative AI reshapes industries. NVIDIA NIM provides production-ready microservice...]]>

The demand for ready-to-deploy high-performance inference is growing as generative AI reshapes industries. NVIDIA NIM provides production-ready microservice...

hpc-mlperf-training-graphic

The demand for ready-to-deploy high-performance inference is growing as generative AI reshapes industries. NVIDIA NIM provides production-ready microservice containers for AI model inference, constantly improving enterprise-grade generative AI performance. With the upcoming NIM version 1.4 scheduled for release in early December, request performance is improved by up to 2.4x out-of-the-box with��

]]> 0 Amr Elmeleegy <![CDATA[Streamlining AI Inference Performance and Deployment with NVIDIA TensorRT-LLM Chunked Prefill]]> http://www.open-lab.net/blog/?p=92052 2024-11-15T17:59:38Z 2024-11-15T17:59:35Z

In this blog post, we take a closer look at chunked prefill, a feature of NVIDIA TensorRT-LLM that increases GPU utilization and simplifies the deployment...]]>

In this blog post, we take a closer look at chunked prefill, a feature of NVIDIA TensorRT-LLM that increases GPU utilization and simplifies the deployment...

In this blog post, we take a closer look at chunked prefill, a feature of NVIDIA TensorRT-LLM that increases GPU utilization and simplifies the deployment experience for developers. This builds on our previous post discussing how advanced KV cache optimization features in TensorRT-LLM improve performance up to 5x in use cases that require system prefills. When a user submits a request to��

]]> 0 Sukru Burc Eryilmaz <![CDATA[NVIDIA Blackwell Doubles LLM Training Performance in MLPerf Training v4.1]]> http://www.open-lab.net/blog/?p=91807 2024-11-14T17:10:37Z 2024-11-13T16:00:00Z

As models grow larger and are trained on more data, they become more capable, making them more useful. To train these models quickly, more performance,...]]>

As models grow larger and are trained on more data, they become more capable, making them more useful. To train these models quickly, more performance,...

NVIDIA Blackwell Doubles LLM Training Performance in MLPerf Training v4.1

As models grow larger and are trained on more data, they become more capable, making them more useful. To train these models quickly, more performance, delivered at data center scale, is required. The NVIDIA Blackwell platform, launched at GTC 2024 and now in full production, integrates seven types of chips: GPU, CPU, DPU, NVLink Switch chip, InfiniBand Switch, and Ethernet Switch.

]]> 0 Kazuki Fujii <![CDATA[Developing a 172B LLM with Strong Japanese Capabilities Using NVIDIA Megatron-LM]]> http://www.open-lab.net/blog/?p=91656 2024-11-14T19:32:44Z 2024-11-11T19:50:19Z

Generative AI has the ability to create entirely new content that traditional machine learning (ML) methods struggle to produce. In the field of natural...]]>

Generative AI has the ability to create entirely new content that traditional machine learning (ML) methods struggle to produce. In the field of natural...

nemo-megatron-mini-beat-promo-li-tw-2048x1024 copy

Generative AI has the ability to create entirely new content that traditional machine learning (ML) methods struggle to produce. In the field of natural language processing (NLP), the advent of large language models (LLMs) specifically has led to many innovative and creative AI use cases. These include customer support chatbots, voice assistants, text summarization and translation��

]]> 0 Elad Blatt <![CDATA[Transforming Telecom Networks to Manage and Optimize AI Workloads]]> http://www.open-lab.net/blog/?p=91542 2024-11-14T19:36:13Z 2024-11-08T16:00:00Z

5G global connections numbered nearly 2 billion earlier this year, and are projected to reach 7.7 billion by 2028. While 5G has delivered faster speeds, higher...]]>

5G global connections numbered nearly 2 billion earlier this year, and are projected to reach 7.7 billion by 2028. While 5G has delivered faster speeds, higher...

network-structure

5G global connections numbered nearly 2 billion earlier this year, and are projected to reach 7.7 billion by 2028. While 5G has delivered faster speeds, higher capacity, and improved latency, particularly for video and data traffic, the initial promise of creating new revenues for network operators has remained elusive. Most mobile applications are now routed to the cloud. At the same time��

]]> 0 Asawaree Bhide <![CDATA[Advancing Humanoid Robot Sight and Skill Development with NVIDIA Project GR00T]]> http://www.open-lab.net/blog/?p=91333 2024-11-14T17:10:46Z 2024-11-06T16:00:00Z

Humanoid robots present a multifaceted challenge at the intersection of mechatronics, control theory, and AI. The dynamics and control of humanoid robots are...]]>

Humanoid robots present a multifaceted challenge at the intersection of mechatronics, control theory, and AI. The dynamics and control of humanoid robots are...

humanoid-robot-gif

Humanoid robots present a multifaceted challenge at the intersection of mechatronics, control theory, and AI. The dynamics and control of humanoid robots are complex, requiring advanced tools, techniques, and algorithms to maintain balance during locomotion and manipulation tasks. Collecting robot data and integrating sensors also pose significant challenges, as humanoid robots require a fusion of��

]]> 0 Chelsea Gomatam <![CDATA[Discover New Biological Insights with Accelerated Pangenome Alignment in NVIDIA Parabricks]]> http://www.open-lab.net/blog/?p=91220 2024-11-14T17:10:48Z 2024-11-04T17:39:18Z

NVIDIA Parabricks is a scalable genomics analysis software suite that solves omics challenges with accelerated computing and deep learning to unlock new...]]>

NVIDIA Parabricks is a scalable genomics analysis software suite that solves omics challenges with accelerated computing and deep learning to unlock new...

dna

NVIDIA Parabricks is a scalable genomics analysis software suite that solves omics challenges with accelerated computing and deep learning to unlock new scientific breakthroughs. NVIDIA Parabricks v4.4 introduces new features and functionality including accelerated pangenome graph alignment, as announced at the American Society of Human Genetics (ASHG) national meeting. The core new feature��

]]> 1 Amr Elmeleegy <![CDATA[NVIDIA GH200 Superchip Accelerates Inference by 2x in Multiturn Interactions with Llama Models]]> http://www.open-lab.net/blog/?p=90897 2024-11-06T02:24:56Z 2024-10-28T15:00:00Z

Deploying large language models (LLMs) in production environments often requires making hard trade-offs between enhancing user interactivity and increasing...]]>

Deploying large language models (LLMs) in production environments often requires making hard trade-offs between enhancing user interactivity and increasing...

grace-superchip

Deploying large language models (LLMs) in production environments often requires making hard trade-offs between enhancing user interactivity and increasing system throughput. While enhancing user interactivity requires minimizing time to first token (TTFT), increasing throughput requires increasing tokens per second. Improving one aspect often results in the decline of the other��

]]> 1 Max Bazalii <![CDATA[Building AI Agents to Automate Software Test Case Creation]]> http://www.open-lab.net/blog/?p=90387 2025-02-17T05:12:03Z 2024-10-24T16:00:00Z

In software development, testing is crucial for ensuring the quality and reliability of the final product. However, creating test plans and specifications can...]]>

In software development, testing is crucial for ensuring the quality and reliability of the final product. However, creating test plans and specifications can... Decorative image of a bot avatar poised between different processes.

In software development, testing is crucial for ensuring the quality and reliability of the final product. However, creating test plans and specifications can be time-consuming and labor-intensive, especially when managing multiple requirements and diverse test types in complex systems. Many of these tasks are traditionally performed manually by test engineers. This post is part of the��

]]> 1 Maggie Zhang <![CDATA[Scaling LLMs with NVIDIA Triton and NVIDIA TensorRT-LLM Using Kubernetes]]> http://www.open-lab.net/blog/?p=90412 2025-03-18T18:18:17Z 2024-10-22T16:53:55Z

Large language models (LLMs) have been widely used for chatbots, content generation, summarization, classification, translation, and more. State-of-the-art LLMs...]]>

Large language models (LLMs) have been widely used for chatbots, content generation, summarization, classification, translation, and more. State-of-the-art LLMs...

llm-graphic

As of 3/18/25, NVIDIA Triton Inference Server is now NVIDIA Dynamo. Large language models (LLMs) have been widely used for chatbots, content generation, summarization, classification, translation, and more. State-of-the-art LLMs and foundation models, such as Llama, Gemma, GPT, and Nemotron, have demonstrated human-like understanding and generative abilities. Thanks to these models��

]]> 0 Maryam Ashoori <![CDATA[IBM��s New Granite 3.0 Generative AI Models Are Small, Yet Highly Accurate and Efficient]]> http://www.open-lab.net/blog/?p=90636 2024-11-22T23:09:36Z 2024-10-21T19:15:35Z

Today, IBM released the third generation of IBM Granite, a collection of open language models and complementary tools. Prior generations of Granite focused on...]]>

Today, IBM released the third generation of IBM Granite, a collection of open language models and complementary tools. Prior generations of Granite focused on...

IBM Granite Models NVIDIA

Today, IBM released the third generation of IBM Granite, a collection of open language models and complementary tools. Prior generations of Granite focused on domain-specific use cases; the latest IBM Granite models meet or exceed the performance of leading similarly sized open models across both academic and enterprise benchmarks. The developer-friendly Granite 3.0 generative AI models are��

]]> 0 Nirmal Kumar Juluru <![CDATA[Train Highly Accurate LLMs with the Zyda-2 Open 5T-Token Dataset Processed with NVIDIA NeMo Curator]]> http://www.open-lab.net/blog/?p=89677 2024-10-18T20:10:29Z 2024-10-15T18:00:00Z

Open-source datasets have significantly democratized access to high-quality data, lowering the barriers of entry for developers and researchers to train...]]>

Open-source datasets have significantly democratized access to high-quality data, lowering the barriers of entry for developers and researchers to train... Decorative image.

Decorative image.

Open-source datasets have significantly democratized access to high-quality data, lowering the barriers of entry for developers and researchers to train cutting-edge generative AI models. By providing free access to diverse, high-quality, and well-curated datasets, open-source datasets enable the open-source community to train models at or close to the frontier, facilitating the rapid advancement��

]]> 0 Nicola Sessions <![CDATA[DataStax Announces New AI Development Platform, Built with NVIDIA AI]]> http://www.open-lab.net/blog/?p=90307 2025-02-17T05:27:03Z 2024-10-15T13:00:00Z

As enterprises increasingly adopt AI technologies, they face a complex challenge of efficiently developing, securing, and continuously improving AI applications...]]>

As enterprises increasingly adopt AI technologies, they face a complex challenge of efficiently developing, securing, and continuously improving AI applications...

DataStax AI Platform Built by NVIDIA

As enterprises increasingly adopt AI technologies, they face a complex challenge of efficiently developing, securing, and continuously improving AI applications to leverage their data assets. They need a unified, end-to-end solution that simplifies AI development, enhances security, and enables continuous optimization, allowing organizations to harness the full potential of their data for AI��

]]> 0 Andres Diaz-Pinto <![CDATA[Advancing Surgical Robotics with AI-Driven Simulation and Digital Twin Technology]]> http://www.open-lab.net/blog/?p=89981 2025-01-07T20:25:02Z 2024-10-14T19:30:00Z

The integration of robotic surgical assistants (RSAs) in operating rooms offers substantial advantages for both surgeons and patient outcomes. Currently...]]>

The integration of robotic surgical assistants (RSAs) in operating rooms offers substantial advantages for both surgeons and patient outcomes. Currently...

tissue-retraction

The integration of robotic surgical assistants (RSAs) in operating rooms offers substantial advantages for both surgeons and patient outcomes. Currently operated through teleoperation by trained surgeons at a console, these surgical robot platforms provide augmented dexterity that has the potential to streamline surgical workflows and alleviate surgeon workloads. Exploring visual behavior cloning��

]]> 0 Amparo Canaveras <![CDATA[Advanced RAG Techniques for Telco O-RAN Specifications Using NVIDIA NIM Microservices]]> http://www.open-lab.net/blog/?p=90153 2024-11-20T20:00:36Z 2024-10-10T16:54:41Z

Mobile communication standards play a crucial role in the telecommunications ecosystem by harmonizing technology protocols to facilitate interoperability...]]>

Mobile communication standards play a crucial role in the telecommunications ecosystem by harmonizing technology protocols to facilitate interoperability...

skyscrapers-overlay

Mobile communication standards play a crucial role in the telecommunications ecosystem by harmonizing technology protocols to facilitate interoperability between networks and devices from different vendors. As these standards evolve, telecommunications companies face the ongoing challenge of managing complexity and volume. By leveraging generative AI, telecommunications companies can automate��

]]> 0 Nick Comly <![CDATA[Boosting Llama 3.1 405B Throughput by Another 1.5x on NVIDIA H200 Tensor Core GPUs and NVLink Switch]]> http://www.open-lab.net/blog/?p=90040 2024-11-22T23:12:12Z 2024-10-09T15:00:00Z

The continued growth of LLMs capability, fueled by increasing parameter counts and support for longer contexts, has led to their usage in a wide variety of...]]>

The continued growth of LLMs capability, fueled by increasing parameter counts and support for longer contexts, has led to their usage in a wide variety of...

HGX-H200-product-photo-close-up copy 2

The continued growth of LLMs capability, fueled by increasing parameter counts and support for longer contexts, has led to their usage in a wide variety of applications, each with diverse deployment requirements. For example, a chatbot supports a small number of users at very low latencies for good interactivity. Meanwhile, synthetic data generation requires high throughput to process many items��

]]> 1 Sharath Sreenivas <![CDATA[Mistral-NeMo-Minitron 8B Model Delivers Unparalleled Accuracy]]> http://www.open-lab.net/blog/?p=87739 2024-10-17T18:51:42Z 2024-10-08T19:20:54Z

This post was originally published August 21, 2024 but has been revised with current data. Recently, NVIDIA and Mistral AI unveiled Mistral NeMo 12B, a leading...]]>

This post was originally published August 21, 2024 but has been revised with current data. Recently, NVIDIA and Mistral AI unveiled Mistral NeMo 12B, a leading...

community-ai-model-graphic

This post was originally published August 21, 2024 but has been revised with current data. Recently, NVIDIA and Mistral AI unveiled Mistral NeMo 12B, a leading state-of-the-art large language model (LLM). Mistral NeMo 12B consistently outperforms similarly sized models on a wide range of benchmarks. We announced Mistral-NeMo-Minitron 8B, one of the most advanced open-access models in��

]]> 0 Jen Witsoe <![CDATA[Just Released: NVIDIA TensorRT-LLM 0.13.0]]> http://www.open-lab.net/blog/?p=89751 2024-10-17T19:06:58Z 2024-10-04T21:45:36Z

Updates include tensor parallel support for Mamba2, sparse mixer normalization for MoE models, and more.]]>

Updates include tensor parallel support for Mamba2, sparse mixer normalization for MoE models, and more.

Decorative image of an atomic model icon connected to a computer monitor.

Updates include tensor parallel support for Mamba2, sparse mixer normalization for MoE models, and more.

]]> 0 Ville Tuulos <![CDATA[Building LLM-Powered Production Systems with NVIDIA NIM and Outerbounds]]> http://www.open-lab.net/blog/?p=89552 2024-10-17T19:07:03Z 2024-10-02T17:00:00Z

With the rapid expansion of language models over the past 18 months, hundreds of variants are now available. These include large language models (LLMs), small...]]>

With the rapid expansion of language models over the past 18 months, hundreds of variants are now available. These include large language models (LLMs), small...

nvidia-outerbounds-graphic

With the rapid expansion of language models over the past 18 months, hundreds of variants are now available. These include large language models (LLMs), small language models (SLMs), and domain-specific models��many of which are freely accessible for commercial use. For LLMs in particular, the process of fine-tuning with custom datasets has also become increasingly affordable and straightforward.

]]> 0 Annamalai Chockalingam <![CDATA[Accelerating LLMs with llama.cpp on NVIDIA RTX Systems]]> http://www.open-lab.net/blog/?p=89663 2024-11-22T23:11:17Z 2024-10-02T13:00:00Z

The NVIDIA RTX AI for Windows PCs platform offers a thriving ecosystem of thousands of open-source models for application developers to leverage and integrate...]]>

The NVIDIA RTX AI for Windows PCs platform offers a thriving ecosystem of thousands of open-source models for application developers to leverage and integrate...

The NVIDIA RTX AI for Windows PCs platform offers a thriving ecosystem of thousands of open-source models for application developers to leverage and integrate into Windows applications. Notably, llama.cpp is one popular tool, with over 65K GitHub stars at the time of writing. Originally released in 2023, this open-source repository is a lightweight, efficient framework for large language model��

]]> 0 Delilah Liu <![CDATA[Evolving AI-Powered Game Development with Retrieval-Augmented Generation]]> http://www.open-lab.net/blog/?p=89574 2024-11-04T22:51:09Z 2024-10-01T17:00:00Z

Game development is a complex and resource-intensive process, particularly when using advanced tools like Unreal Engine. Developers find themselves navigating...]]>

Game development is a complex and resource-intensive process, particularly when using advanced tools like Unreal Engine. Developers find themselves navigating...

RAG Gaming

Game development is a complex and resource-intensive process, particularly when using advanced tools like Unreal Engine. Developers find themselves navigating through vast amounts of information, often scattered across tutorials, user manuals, API documentation, and the source code itself. This multifaceted journey requires expertise in programming, design, and project management��

]]> 0 Amit Bleiweiss <![CDATA[Evaluating Medical RAG with NVIDIA AI Endpoints and Ragas]]> http://www.open-lab.net/blog/?p=89625 2024-11-07T23:29:42Z 2024-10-01T16:00:00Z

In the rapidly evolving field of medicine, the integration of cutting-edge technologies is crucial for enhancing patient care and advancing research. One such...]]>

In the rapidly evolving field of medicine, the integration of cutting-edge technologies is crucial for enhancing patient care and advancing research. One such... Avatars of a patient in a bed with a doctor sitting at a desk in another location, looking at a computer screen.

Avatars of a patient in a bed with a doctor sitting at a desk in another location, looking at a computer screen.

In the rapidly evolving field of medicine, the integration of cutting-edge technologies is crucial for enhancing patient care and advancing research. One such innovation is retrieval-augmented generation (RAG), which is transforming how medical information is processed and used. RAG combines the capabilities of large language models (LLMs) with external knowledge retrieval��

]]> 0 Chintan Patel <![CDATA[Improve Reinforcement Learning from Human Feedback with Leaderboard-Topping Reward Model]]> http://www.open-lab.net/blog/?p=89583 2024-11-04T22:57:33Z 2024-09-30T19:21:18Z

Llama 3.1 Nemotron 70B Reward model helps generate high-quality training data that aligns with human preferences for finance, retail, healthcare, scientific...]]>

Llama 3.1 Nemotron 70B Reward model helps generate high-quality training data that aligns with human preferences for finance, retail, healthcare, scientific...

rag-visual-blog-1920x1080

Llama 3.1 Nemotron 70B Reward model helps generate high-quality training data that aligns with human preferences for finance, retail, healthcare, scientific research, telecommunications, and sovereign AI.

]]> 0 Elias Wolfberg <![CDATA[AI Chatbot Delivers Multilingual Support to African Farmers]]> http://www.open-lab.net/blog/?p=89513 2024-10-17T19:07:10Z 2024-09-27T18:10:11Z

Some of Africa��s most resource-constrained farmers are gaining access to on-demand, AI-powered advice through a multimodal chatbot?that gives detailed...]]>

Some of Africa��s most resource-constrained farmers are gaining access to on-demand, AI-powered advice through a multimodal chatbot?that gives detailed... An African farmer using an AI-powered chatbot on a cell phone asking a farming related question.

An African farmer using an AI-powered chatbot on a cell phone asking a farming related question.

Some of Africa��s most resource-constrained farmers are gaining access to on-demand, AI-powered advice through a multimodal chatbot that gives detailed recommendations about how to increase yields or fight common pests and crop diseases. Since February, farmers in the East African nation of Malawi have had access to the chatbot, named UlangiziAI, through WhatsApp on mobile phones.

]]> 0 Nick Comly <![CDATA[Low Latency Inference Chapter 2: Blackwell is Coming. NVIDIA GH200 NVL32 with NVLink Switch Gives Signs of Big Leap in Time to First Token Performance]]> http://www.open-lab.net/blog/?p=88938 2024-11-29T21:06:06Z 2024-09-26T21:44:00Z

Many of the most exciting applications of large language models (LLMs), such as interactive speech bots, coding co-pilots, and search, need to begin responding...]]>

Many of the most exciting applications of large language models (LLMs), such as interactive speech bots, coding co-pilots, and search, need to begin responding...

NVIDIA GH200

Many of the most exciting applications of large language models (LLMs), such as interactive speech bots, coding co-pilots, and search, need to begin responding to user queries quickly to deliver positive user experiences. The time that it takes for an LLM to ingest a user prompt (and context, which can be sizable) and begin outputting a response is called time to first token (TTFT).

]]> 0 Anjali Shah <![CDATA[Deploying Accelerated Llama 3.2 from the Edge to the Cloud]]> http://www.open-lab.net/blog/?p=89436 2024-11-07T05:08:12Z 2024-09-25T18:39:49Z

Expanding the open-source Meta Llama collection of models, the Llama 3.2 collection includes vision language models (VLMs), small language models (SLMs), and an...]]>

Expanding the open-source Meta Llama collection of models, the Llama 3.2 collection includes vision language models (VLMs), small language models (SLMs), and an...

llama-3.2-graphic.

Expanding the open-source Meta Llama collection of models, the Llama 3.2 collection includes vision language models (VLMs), small language models (SLMs), and an updated Llama Guard model with support for vision. When paired with the NVIDIA accelerated computing platform, Llama 3.2 offers developers, researchers, and enterprises valuable new capabilities and optimizations to realize their��

]]> 0 Amr Elmeleegy <![CDATA[NVIDIA GH200 Grace Hopper Superchip Delivers Outstanding Performance in MLPerf Inference v4.1]]> http://www.open-lab.net/blog/?p=89401 2024-11-06T02:27:00Z 2024-09-24T16:36:57Z

In the latest round of MLPerf Inference �C a suite of standardized, peer-reviewed inference benchmarks �C the NVIDIA platform delivered outstanding...]]>

In the latest round of MLPerf Inference �C a suite of standardized, peer-reviewed inference benchmarks �C the NVIDIA platform delivered outstanding...

NVIDIA GH200 Grace Hopper Superchip Delivers Outstanding Performance in MLPerf Inference

In the latest round of MLPerf Inference �C a suite of standardized, peer-reviewed inference benchmarks �C the NVIDIA platform delivered outstanding performance across the board. Among the many submissions made using the NVIDIA platform were results using the NVIDIA GH200 Grace Hopper Superchip. GH200 tightly couples an NVIDIA Grace CPU with an NVIDIA Hopper GPU using NVIDIA NVLink-C2C��

]]> 0 Abrar Anwar <![CDATA[Using Generative AI to Enable Robots to Reason and Act with ReMEmbR]]> http://www.open-lab.net/blog/?p=88932 2024-11-07T05:08:39Z 2024-09-23T20:01:55Z

Vision-language models (VLMs) combine the powerful language understanding of foundational LLMs with the vision capabilities of vision transformers (ViTs) by...]]>

Vision-language models (VLMs) combine the powerful language understanding of foundational LLMs with the vision capabilities of vision transformers (ViTs) by... Photo of robot moving down a path.

Photo of robot moving down a path.

Vision-language models (VLMs) combine the powerful language understanding of foundational LLMs with the vision capabilities of vision transformers (ViTs) by projecting text and images into the same embedding space. They can take unstructured multimodal data, reason over it, and return the output in a structured format. Building on a broad base of pretraining, they can be easily adapted for��

]]> 0 ��˳��97caoporen��