AI Platforms / Deployment – NVIDIA Technical Blog

AI Platforms / Deployment – NVIDIA Technical Blog News and tutorials for developers, data scientists, and IT admins 2025-05-15T19:08:56Z http://www.open-lab.net/blog/feed/ Michael Anderson <![CDATA[Accelerating Embedding Lookups with cuEmbed]]> http://www.open-lab.net/blog/?p=96714 2025-05-15T19:07:23Z 2025-05-15T15:00:00Z

NVIDIA recently released cuEmbed, a high-performance, header-only CUDA library that accelerates embedding lookups on NVIDIA GPUs. If you're building...]]>

NVIDIA recently released cuEmbed, a high-performance, header-only CUDA library that accelerates embedding lookups on NVIDIA GPUs. If you’re building recommendation systems, embedding operations are likely consuming significant computational resources. Embedding lookups present a unique optimization challenge. They’re memory-intensive operations with irregular access patterns.

]]> Sama Bali <![CDATA[Choosing Your First Local AI Project?]]> http://www.open-lab.net/blog/?p=99361 2025-05-15T19:08:34Z 2025-04-29T17:00:00Z

AI is rapidly moving beyond centralized cloud and data centers, becoming a powerful tool deployable directly on professional workstations. Thanks to advanced...]]>

AI is rapidly moving beyond centralized cloud and data centers, becoming a powerful tool deployable directly on professional workstations. Thanks to advanced hardware and optimized software, you can build, run, and experiment with sophisticated AI models at your desk or on the go. Welcome to the world of local AI development! Running and developing AI locally on a workstation offers…

]]> Davide Paglieri <![CDATA[Benchmarking Agentic LLM and VLM Reasoning for Gaming with NVIDIA NIM]]> http://www.open-lab.net/blog/?p=99202 2025-05-15T19:08:40Z 2025-04-24T17:00:00Z

This is the first post in the LLM Benchmarking series, which shows how to use GenAI-Perf to benchmark the Meta Llama 3 model when deployed with NVIDIA NIM.?...]]>

This is the first post in the LLM Benchmarking series, which shows how to use GenAI-Perf to benchmark the Meta Llama 3 model when deployed with NVIDIA NIM. Researchers from the University College London (UCL) Deciding, Acting, and Reasoning with Knowledge (DARK) Lab leverage NVIDIA NIM microservices in their new game-based benchmark suite, Benchmarking Agentic LLM and VLM Reasoning On Games…

]]> Brad Nemire <![CDATA[NVIDIA GTC Training Labs Now Available On Demand]]> http://www.open-lab.net/blog/?p=99074 2025-05-15T19:08:47Z 2025-04-22T17:26:28Z

Missed GTC? This year��s training labs are now available on demand to watch anywhere, anytime.]]>

Missed GTC? This year’s training labs are now available on demand to watch anywhere, anytime.

]]> Maximilian M��ller <![CDATA[Optimizing Transformer-Based Diffusion Models for Video Generation with NVIDIA TensorRT]]> http://www.open-lab.net/blog/?p=98927 2025-05-15T19:08:48Z 2025-04-21T18:44:38Z

State-of-the-art image diffusion models take tens of seconds to process a single image. This makes video diffusion even more challenging, requiring significant...]]>

State-of-the-art image diffusion models take tens of seconds to process a single image. This makes video diffusion even more challenging, requiring significant computational resources and high costs. By leveraging the latest FP8 quantization features on NVIDIA Hopper GPUs with NVIDIA TensorRT, it’s possible to significantly reduce inference costs and serve more users with fewer GPUs.

]]> Bartley Richardson https://www.linkedin.com/in/bartleyrichardson/%20 <![CDATA[Upcoming Event: NVIDIA Agent Toolkit Hackathon]]> http://www.open-lab.net/blog/?p=98965 2025-05-15T19:08:50Z 2025-04-18T17:06:38Z

Build a high-performance agentic AI system using the open-source NVIDIA Agent Intelligence toolkit -- contest runs May 12 to May 23.]]>

Build a high-performance agentic AI system using the open-source NVIDIA Agent Intelligence toolkit — contest runs May 12 to May 23.

]]> James Bigler <![CDATA[Neural Rendering in NVIDIA OptiX Using Cooperative Vectors]]> http://www.open-lab.net/blog/?p=98814 2025-05-15T19:08:53Z 2025-04-17T17:00:00Z

The release of NVIDIA OptiX 9.0 introduces a new feature called cooperative vectors that enables AI workflows as part of ray tracing kernels. The feature...]]>

The release of NVIDIA OptiX 9.0 introduces a new feature called cooperative vectors that enables AI workflows as part of ray tracing kernels. The feature leverages NVIDIA RTX Tensor Cores for hardware-accelerated matrix operations and neural net computations during shading. This unlocks AI rendering techniques such as NVIDIA RTX Neural Shaders and NVIDIA RTX Neural Texture Compression (NTC) and…

]]> Sebastian Haan <![CDATA[Developing an AI-Powered Tool for Automatic Citation Validation Using NVIDIA NIM]]> http://www.open-lab.net/blog/?p=98315 2025-05-15T19:08:56Z 2025-04-16T16:40:50Z

The accuracy of citations is crucial for maintaining the integrity of both academic and AI-generated content. When citations are inaccurate or wrong, they can...]]>

The accuracy of citations is crucial for maintaining the integrity of both academic and AI-generated content. When citations are inaccurate or wrong, they can mislead readers and spread false information. As a team of researchers from the University of Sydney specializing in machine learning and AI, we are developing an AI-powered tool capable of efficiently cross-checking and analyzing semantic…

]]> Nirmal Kumar Juluru <![CDATA[NVIDIA Llama Nemotron Ultra Open Model Delivers Groundbreaking Reasoning Accuracy]]> http://www.open-lab.net/blog/?p=98855 2025-05-05T22:33:12Z 2025-04-15T18:00:00Z

AI is no longer just about generating text or images��it��s about deep reasoning, detailed problem-solving, and powerful adaptability for real-world...]]>

AI is no longer just about generating text or images—it’s about deep reasoning, detailed problem-solving, and powerful adaptability for real-world applications in business and in financial, customer, and healthcare services. Available today, the latest Llama Nemotron Ultra reasoning model from NVIDIA delivers leading accuracy among open-source models across intelligence and coding benchmarks…

]]> Matheen Raza <![CDATA[Delivering NVIDIA Accelerated Computing for Enterprise AI Workloads with Rafay]]> http://www.open-lab.net/blog/?p=98533 2025-04-22T23:52:20Z 2025-04-09T20:09:43Z

The worldwide adoption of generative AI has driven massive demand for accelerated compute hardware globally. In enterprises, this has accelerated the deployment...]]>

The worldwide adoption of generative AI has driven massive demand for accelerated compute hardware globally. In enterprises, this has accelerated the deployment of accelerated private cloud infrastructure. At the regional level, this demand for compute infrastructure has given rise to a new category of cloud providers who offer accelerated compute (GPU) capacity for AI workloads, also known as GPU…

]]> Ronen Dar <![CDATA[NVIDIA Open Sources Run:ai Scheduler to Foster Community Collaboration]]> http://www.open-lab.net/blog/?p=98094 2025-04-22T23:59:16Z 2025-04-01T09:00:00Z

Today, NVIDIA announced the open-source release of the KAI Scheduler, a Kubernetes-native GPU scheduling solution, now available under the Apache 2.0 license....]]>

Today, NVIDIA announced the open-source release of the KAI Scheduler, a Kubernetes-native GPU scheduling solution, now available under the Apache 2.0 license. Originally developed within the Run:ai platform, KAI Scheduler is now available to the community while also continuing to be packaged and delivered as part of the NVIDIA Run:ai platform. This initiative underscores NVIDIA’s commitment to…

]]> Ameya Parab <![CDATA[Practical Tips for Preventing GPU Fragmentation for Volcano Scheduler]]> http://www.open-lab.net/blog/?p=98171 2025-04-03T18:44:56Z 2025-03-31T20:00:54Z

At NVIDIA, we take pride in tackling complex infrastructure challenges with precision and innovation. When Volcano faced GPU underutilization in their NVIDIA...]]>

At NVIDIA, we take pride in tackling complex infrastructure challenges with precision and innovation. When Volcano faced GPU underutilization in their NVIDIA DGX Cloud-provisioned Kubernetes cluster, we stepped in to deliver a solution that not only met but exceeded expectations. By combining advanced scheduling techniques with a deep understanding of distributed workloads…

]]> Arun Raman <![CDATA[Deploying the NVIDIA AI Blueprint for Cost-Efficient LLM Routing]]> http://www.open-lab.net/blog/?p=98006 2025-04-23T00:01:08Z 2025-03-26T22:01:20Z

Since the release of ChatGPT in November 2022, the capabilities of large language models (LLMs) have surged, and the number of available models has grown...]]>

Since the release of ChatGPT in November 2022, the capabilities of large language models (LLMs) have surged, and the number of available models has grown exponentially. With this expansion, LLMs now vary widely in cost, performance, and specialization. For example, straightforward tasks like text summarization can be efficiently handled by smaller, general-purpose models. In contrast…

]]> Brian Shi <![CDATA[Boosting Q&A Accuracy with GraphRAG Using PyG and Graph Databases]]> http://www.open-lab.net/blog/?p=97900 2025-04-03T18:46:06Z 2025-03-26T21:41:08Z

Large language models (LLMs) often struggle with accuracy when handling domain-specific questions, especially those requiring multi-hop reasoning or access to...]]>

Large language models (LLMs) often struggle with accuracy when handling domain-specific questions, especially those requiring multi-hop reasoning or access to proprietary data. While retrieval-augmented generation (RAG) can help, traditional vector search methods often fall short. In this tutorial, we show you how to implement GraphRAG in combination with fine-tuned GNN+LLM models to achieve…

]]> Pradyumna Desale <![CDATA[Automating AI Factories with NVIDIA Mission Control]]> http://www.open-lab.net/blog/?p=98012 2025-04-03T18:47:00Z 2025-03-25T18:45:11Z

Advanced AI models such as DeepSeek-R1 are proving that enterprises can now build cutting-edge AI models specialized with their own data and expertise. These...]]>

Advanced AI models such as DeepSeek-R1 are proving that enterprises can now build cutting-edge AI models specialized with their own data and expertise. These models can be tailored to unique use cases, tackling diverse challenges like never before. Based on the success of early AI adopters, many organizations are shifting their focus to full-scale production AI factories. Yet the process of…

]]> Andrew Fear <![CDATA[NVIDIA Demonstrates GeForce NOW for Game AI Inference and Streamlined Hands-on Opportunities]]> http://www.open-lab.net/blog/?p=97825 2025-04-17T18:17:43Z 2025-03-20T17:34:38Z

NVIDIA cloud gaming service GeForce NOW is providing developers and publishers with new tools to bring their games to more gamers��and offer new experiences...]]>

NVIDIA cloud gaming service GeForce NOW is providing developers and publishers with new tools to bring their games to more gamers—and offer new experiences only possible through the cloud. These tools lower local GPU requirements to expand reach and eliminate cost by offloading AI inference tasks to the cloud. At the Gamer Developer’s Conference (GDC) 2025, NVIDIA demonstrated hybrid AI…

]]> Ansley Dunn <![CDATA[Easily Build Edge AI Apps with Dynamic Flow Control in NVIDIA Holoscan 3.0]]> http://www.open-lab.net/blog/?p=97746 2025-04-23T00:08:15Z 2025-03-20T13:30:00Z

NVIDIA announced at GTC 2025 the release of NVIDIA Holoscan 3.0, the real-time AI sensor processing platform. This latest version provides dynamic flow control,...]]>

NVIDIA announced at GTC 2025 the release of NVIDIA Holoscan 3.0, the real-time AI sensor processing platform. This latest version provides dynamic flow control, empowering developers to design more robust, scalable, and efficient systems. With physical AI rapidly evolving, Holoscan 3.0 is built to adapt, making it easier than ever to tackle the challenges of today’s dynamic environments.

]]> Phoebe Lee <![CDATA[NVIDIA Virtual GPU 18.0 Enables VDI for AI on Every Virtualized Platform]]> http://www.open-lab.net/blog/?p=97618 2025-04-23T00:07:56Z 2025-03-19T20:00:00Z

NVIDIA Virtual GPU (vGPU) technology unlocks AI capabilities within Virtual Desktop Infrastructure (VDI), making it more powerful and versatile than ever...]]>

NVIDIA Virtual GPU (vGPU) technology unlocks AI capabilities within Virtual Desktop Infrastructure (VDI), making it more powerful and versatile than ever before. By powering AI-driven workloads across virtualized environments, vGPU boosts productivity, strengthens security, and optimizes performance. The latest software release empowers businesses and developers to push innovation further…

]]> Dave Salvator <![CDATA[NVIDIA Blackwell Ultra for the Era of AI Reasoning]]> http://www.open-lab.net/blog/?p=96761 2025-03-20T22:34:30Z 2025-03-19T18:00:15Z

For years, advancements in AI have followed a clear trajectory through pretraining scaling: larger models, more data, and greater computational resources lead...]]>

For years, advancements in AI have followed a clear trajectory through pretraining scaling: larger models, more data, and greater computational resources lead to breakthrough capabilities. In the last 5 years, pretraining scaling has increased compute requirements at an incredible rate of 50M times. However, building more intelligent systems is no longer just about pretraining bigger models.

]]> Michael Zephyr <![CDATA[MONAI Integrates Advanced Agentic Architectures to Establish Multimodal Medical AI Ecosystem]]> http://www.open-lab.net/blog/?p=97638 2025-04-23T00:26:59Z 2025-03-19T16:00:00Z

The growing volume and complexity of medical data��and the pressing need for early disease diagnosis and improved healthcare efficiency��are driving...]]>

The growing volume and complexity of medical data—and the pressing need for early disease diagnosis and improved healthcare efficiency—are driving unprecedented advancements in medical AI. Among the most transformative innovations in this field are multimodal AI models that simultaneously process text, images, and video. These models offer a more comprehensive understanding of patient data than…

]]> Vishal Ganeriwala <![CDATA[Seamlessly Scale AI Across Cloud Environments with NVIDIA DGX Cloud Serverless Inference]]> http://www.open-lab.net/blog/?p=97192 2025-03-20T17:07:54Z 2025-03-18T21:22:51Z

NVIDIA DGX Cloud Serverless Inference is an auto-scaling AI inference solution that enables application deployment with speed and reliability. Powered by NVIDIA...]]>

NVIDIA DGX Cloud Serverless Inference is an auto-scaling AI inference solution that enables application deployment with speed and reliability. Powered by NVIDIA Cloud Functions (NVCF), DGX Cloud Serverless Inference abstracts multi-cluster infrastructure setups across multi-cloud and on-premises environments for GPU-accelerated workloads. Whether managing AI workloads…

]]> Emily Potyraj <![CDATA[Measure and Improve AI Workload Performance with NVIDIA DGX Cloud Benchmarking]]> http://www.open-lab.net/blog/?p=97548 2025-05-06T17:00:29Z 2025-03-18T21:21:17Z

As AI capabilities advance, understanding the impact of hardware and software infrastructure choices on workload performance is crucial for both technical...]]>

As AI capabilities advance, understanding the impact of hardware and software infrastructure choices on workload performance is crucial for both technical validation and business planning. Organizations need a better way to assess real-world, end-to-end AI workload performance and the total cost of ownership rather than just comparing raw FLOPs or hourly cost per GPU.

]]> Chen Fu <![CDATA[Streamline LLM Deployment for Autonomous Vehicle Applications with NVIDIA DriveOS LLM SDK]]> http://www.open-lab.net/blog/?p=96776 2025-03-07T20:13:46Z 2025-03-10T19:30:00Z

Large language models (LLMs) have shown remarkable generalization capabilities in natural language processing (NLP). They are used in a wide range of...]]>

Large language models (LLMs) have shown remarkable generalization capabilities in natural language processing (NLP). They are used in a wide range of applications, including translation, digital assistants, recommendation systems, context analysis, code generation, cybersecurity, and more. In automotive applications, there is growing demand for LLM-based solutions for both autonomous driving and…

]]> 2 Shelby Thomas <![CDATA[Ensuring Reliable Model Training on NVIDIA DGX Cloud]]> http://www.open-lab.net/blog/?p=96789 2025-03-24T18:36:43Z 2025-03-10T16:26:44Z

Training AI models on massive GPU clusters presents significant challenges for model builders. Because manual intervention becomes impractical as job scale...]]>

Training AI models on massive GPU clusters presents significant challenges for model builders. Because manual intervention becomes impractical as job scale increases, automation is critical to maintaining high GPU utilization and training productivity. An exceptional training experience requires resilient systems that provide low-latency error attribution and automatic fail over based on root…

]]> Douglas Moore <![CDATA[Accelerate Medical Imaging AI Operations with Databricks Pixels 2.0 and MONAI]]> http://www.open-lab.net/blog/?p=96530 2025-04-23T02:39:52Z 2025-02-28T18:11:50Z

According to the World Health Organization (WHO), 3.6 billion medical imaging tests are performed every year globally to diagnose, monitor, and treat various...]]>

According to the World Health Organization (WHO), 3.6 billion medical imaging tests are performed every year globally to diagnose, monitor, and treat various conditions. Most of these images are stored in a globally recognized standard called DICOM (Digital Imaging and Communications in Medicine). Imaging studies in DICOM format are a combination of unstructured images and structured metadata.

]]> Anu Srivastava <![CDATA[Latest Multimodal Addition to Microsoft Phi SLMs Trained on NVIDIA GPUs]]> http://www.open-lab.net/blog/?p=96519 2025-04-23T02:39:30Z 2025-02-26T22:05:00Z

Large language models (LLMs) have permeated every industry and changed the potential of technology. However, due to their massive size they are not practical...]]>

Large language models (LLMs) have permeated every industry and changed the potential of technology. However, due to their massive size they are not practical for the current resource constraints that many companies have. The rise of small language models (SLMs) bridge quality and cost by creating models with a smaller resource footprint. SLMs are a subset of language models that tend to…

]]> Charu Chaubal <![CDATA[NVIDIA AI Enterprise Adds Support for NVIDIA H200 NVL]]> http://www.open-lab.net/blog/?p=96424 2025-04-23T02:34:39Z 2025-02-24T22:37:47Z

NVIDIA AI Enterprise is the cloud-native software platform for the development and deployment of production-grade AI solutions. The latest release of the NVIDIA...]]>

NVIDIA AI Enterprise is the cloud-native software platform for the development and deployment of production-grade AI solutions. The latest release of the NVIDIA AI Enterprise infrastructure software collection adds support for the latest NVIDIA data center GPU, NVIDIA H200 NVL, giving your enterprise new options for powering cutting-edge use cases such as agentic and generative AI with some of the…

]]> Sama Bali <![CDATA[Transforming Product Design Workflows in Manufacturing with Generative AI]]> http://www.open-lab.net/blog/?p=96242 2025-04-23T02:42:26Z 2025-02-20T19:32:11Z

Traditional design and engineering workflows in the manufacturing industry have long been characterized by a sequential, iterative approach that is often...]]>

Traditional design and engineering workflows in the manufacturing industry have long been characterized by a sequential, iterative approach that is often time-consuming and resource intensive. These conventional methods typically involve stages such as requirement gathering, conceptual design, detailed design, analysis, prototyping, and testing, with each phase dependent on the results of previous…

]]> Ram Cherukuri <![CDATA[Spotlight: BRLi and Toulouse INP Develop AI-Based Flood Models Using NVIDIA PhysicsNeMo]]> http://www.open-lab.net/blog/?p=95990 2025-04-23T02:45:16Z 2025-02-13T21:00:00Z

Flooding poses a significant threat to 1.5 billion people, making it the most common cause of major natural disasters. Floods cause up to $25 billion in global...]]>

Flooding poses a significant threat to 1.5 billion people, making it the most common cause of major natural disasters. Floods cause up to $25 billion in global economic damage every year. Flood forecasting is a critical tool in disaster preparedness and risk mitigation. Numerical methods have long been developed that provide accurate simulations of river basins. With these, engineers such as those…

]]> Emily Potyraj <![CDATA[NVIDIA DGX Cloud Introduces Ready-To-Use Templates to Benchmark AI Platform Performance]]> http://www.open-lab.net/blog/?p=95558 2025-05-06T17:01:29Z 2025-02-11T17:00:00Z

In the rapidly evolving landscape of AI systems and workloads, achieving optimal model training performance extends far beyond chip speed. It requires a...]]>

In the rapidly evolving landscape of AI systems and workloads, achieving optimal model training performance extends far beyond chip speed. It requires a comprehensive evaluation of the entire stack, from compute to networking to model framework. Navigating the complexities of AI system performance can be difficult. There are many application changes that you can make…

]]> Pranav Marathe <![CDATA[Just Released: Tripy, a Python Programming Model For TensorRT]]> http://www.open-lab.net/blog/?p=95947 2025-02-10T17:08:43Z 2025-02-10T17:08:40Z

Experience high-performance inference, usability, intuitive APIs, easy debugging with eager mode, clear error messages, and more.]]>

Experience high-performance inference, usability, intuitive APIs, easy debugging with eager mode, clear error messages, and more.

]]> Isabel Hulseman <![CDATA[New NVIDIA AI Blueprint: Build a Customizable RAG Pipeline]]> http://www.open-lab.net/blog/?p=95614 2025-02-13T20:44:16Z 2025-01-30T22:26:12Z

Connect AI applications to enterprise data using embedding and reranking models for information retrieval.]]>

Connect AI applications to enterprise data using embedding and reranking models for information retrieval.

]]> Martin Cimmino <![CDATA[Continued Pretraining of State-of-the-Art LLMs for Sovereign AI and Regulated Industries with iGenius and NVIDIA DGX Cloud]]> http://www.open-lab.net/blog/?p=95012 2025-01-23T19:54:22Z 2025-01-16T12:00:00Z

In recent years, large language models (LLMs) have achieved extraordinary progress in areas such as reasoning, code generation, machine translation, and...]]>

In recent years, large language models (LLMs) have achieved extraordinary progress in areas such as reasoning, code generation, machine translation, and summarization. However, despite their advanced capabilities, foundation models have limitations when it comes to domain-specific expertise such as finance or healthcare or capturing cultural and language nuances beyond English.

]]> Sama Bali <![CDATA[GPU Memory Essentials for AI Performance]]> http://www.open-lab.net/blog/?p=94979 2025-01-23T19:54:24Z 2025-01-15T16:00:00Z

Generative AI has revolutionized how people bring ideas to life, and agentic AI represents the next leap forward in this technological evolution. By leveraging...]]>

Generative AI has revolutionized how people bring ideas to life, and agentic AI represents the next leap forward in this technological evolution. By leveraging sophisticated, autonomous reasoning and iterative planning, AI agents can tackle complex, multistep problems with remarkable efficiency. As AI continues to revolutionize industries, the demand for running AI models locally has surged.

]]> 1 Dror Goldenberg <![CDATA[Powering the Next Wave of DPU-Accelerated Cloud Infrastructures with NVIDIA DOCA Platform Framework]]> http://www.open-lab.net/blog/?p=94889 2025-01-23T19:54:26Z 2025-01-13T17:30:25Z

Organizations are increasingly turning to accelerated computing to meet the demands of generative AI, 5G telecommunications, and sovereign clouds. NVIDIA has...]]>

Organizations are increasingly turning to accelerated computing to meet the demands of generative AI, 5G telecommunications, and sovereign clouds. NVIDIA has unveiled the DOCA Platform Framework (DPF), providing foundational building blocks to unlock the power of NVIDIA BlueField DPUs and optimize GPU-accelerated computing platforms. Serving as both an orchestration framework and an implementation…

]]> Zeeshan Patel <![CDATA[Accelerate Custom Video Foundation Model Pipelines with New NVIDIA NeMo Framework Capabilities]]> http://www.open-lab.net/blog/?p=94541 2025-03-20T16:23:00Z 2025-01-07T16:00:00Z

Generative AI has evolved from text-based models to multimodal models, with a recent expansion into video, opening up new potential uses across various...]]>

Generative AI has evolved from text-based models to multimodal models, with a recent expansion into video, opening up new potential uses across various industries. Video models can create new experiences for users or simulate scenarios for training autonomous agents at scale. They are helping revolutionize various industries including robotics, autonomous vehicles, and entertainment.

]]> Charu Chaubal <![CDATA[New Whitepaper: NVIDIA AI Enterprise Security]]> http://www.open-lab.net/blog/?p=94475 2024-12-20T20:56:54Z 2024-12-20T00:41:33Z

This white paper details our commitment to securing the NVIDIA AI Enterprise software stack. It outlines the processes and measures NVIDIA takes to ensure...]]>

This white paper details our commitment to securing the NVIDIA AI Enterprise software stack. It outlines the processes and measures NVIDIA takes to ensure container security.

]]> Michelle Horton <![CDATA[Top Posts of 2024 Highlight NVIDIA NIM, LLM Breakthroughs, and Data Science Optimization]]> http://www.open-lab.net/blog/?p=93566 2024-12-16T18:34:16Z 2024-12-16T18:34:14Z

2024 was another landmark year for developers, researchers, and innovators working with NVIDIA technologies. From groundbreaking developments in AI inference to...]]>

2024 was another landmark year for developers, researchers, and innovators working with NVIDIA technologies. From groundbreaking developments in AI inference to empowering open-source contributions, these blog posts highlight the breakthroughs that resonated most with our readers. NVIDIA NIM Offers Optimized Inference Microservices for Deploying AI Models at Scale Introduced in…

]]> 0 Michelle Horton <![CDATA[Time-Lapse AI Model Enhances IVF Embryo Selection]]> http://www.open-lab.net/blog/?p=93767 2024-12-18T16:38:55Z 2024-12-12T17:29:22Z

Researchers from Weill Cornell Medicine have developed an AI-powered model that could help couples undergoing in vitro fertilization (IVF) and guide...]]>

Researchers from Weill Cornell Medicine have developed an AI-powered model that could help couples undergoing in vitro fertilization (IVF) and guide embryologists in selecting healthy embryos for implantation. Recently published in Nature Communications, the study presents the Blastocyst Evaluation Learning Algorithm (BELA). This state-of-the-art deep learning model evaluates embryo quality and…

]]> Amr Elmeleegy <![CDATA[Spotlight: Perplexity AI Serves 400 Million Search Queries a Month Using NVIDIA Inference Stack]]> http://www.open-lab.net/blog/?p=93396 2025-03-18T18:26:38Z 2024-12-05T17:58:43Z

The demand for AI-enabled services continues to grow rapidly, placing increasing pressure on IT and infrastructure teams. These teams are tasked with...]]>

As of 3/18/25, NVIDIA Triton Inference Server is now NVIDIA Dynamo. The demand for AI-enabled services continues to grow rapidly, placing increasing pressure on IT and infrastructure teams. These teams are tasked with provisioning the necessary hardware and software to meet that demand while simultaneously balancing cost efficiency with optimal user experience. This challenge was faced by the…

]]> Michelle Horton <![CDATA[How AI is Making Climate Modeling Faster, Greener, and More Accurate]]> http://www.open-lab.net/blog/?p=93000 2024-12-12T19:35:22Z 2024-12-04T18:00:00Z

Christopher Bretherton, Senior Director of Climate Modeling at the Allen Institute for AI (AI2), highlights how AI is revolutionizing climate science. In this...]]>

Christopher Bretherton, Senior Director of Climate Modeling at the Allen Institute for AI (AI2), highlights how AI is revolutionizing climate science. In this NVIDIA GTC 2024 session, Bretherton presents advancements in machine learning-based emulators for predicting regional climate changes and precipitation extremes. These tools accelerate climate modeling, making it faster, more efficient…

]]> Vega Shah <![CDATA[In-Silico Antibody Development with AlphaBind Using NVIDIA BioNeMo and AWS HealthOmics]]> http://www.open-lab.net/blog/?p=92757 2024-12-12T19:38:30Z 2024-12-03T18:00:00Z

Antibodies have become the most prevalent class of therapeutics, primarily due to their ability to target specific antigens, enabling them to treat a wide range...]]>

Antibodies have become the most prevalent class of therapeutics, primarily due to their ability to target specific antigens, enabling them to treat a wide range of diseases, from cancer to autoimmune disorders. Their specificity reduces the likelihood of off-target effects, making them safer and often more effective than small-molecule drugs for complex conditions. As a result…

]]> Carl (Izzy) Putterman <![CDATA[TensorRT-LLM Speculative Decoding Boosts Inference Throughput by up to 3.6x]]> http://www.open-lab.net/blog/?p=92847 2025-01-11T17:32:51Z 2024-12-02T23:09:43Z

NVIDIA TensorRT-LLM support for speculative decoding now provides over 3x the speedup in total token throughput. TensorRT-LLM is an open-source library that...]]>

NVIDIA TensorRT-LLM support for speculative decoding now provides over 3x the speedup in total token throughput. TensorRT-LLM is an open-source library that provides blazing-fast inference support for numerous popular large language models (LLMs) on NVIDIA GPUs. By adding support for speculative decoding on single GPU and single-node multi-GPU, the library further expands its supported…

]]> 3 Amr Elmeleegy <![CDATA[NVIDIA TensorRT-LLM Multiblock Attention Boosts Throughput by More Than 3x for Long Sequence Lengths on NVIDIA HGX H200]]> http://www.open-lab.net/blog/?p=92591 2024-12-12T19:47:20Z 2024-11-22T00:53:18Z

Generative AI models are advancing rapidly. Every generation of models comes with a larger number of parameters and longer context windows. The Llama 2 series...]]>

Generative AI models are advancing rapidly. Every generation of models comes with a larger number of parameters and longer context windows. The Llama 2 series of models introduced in July 2023 had a context length of 4K tokens, and the Llama 3.1 models, introduced only a year later, dramatically expanded that to 128K tokens. While long context lengths allow models to perform cognitive tasks…

]]> 1 Bethann Noble <![CDATA[Deploying Fine-Tuned AI Models with NVIDIA NIM]]> http://www.open-lab.net/blog/?p=91696 2024-12-17T00:07:21Z 2024-11-21T22:04:57Z

For organizations adapting AI foundation models with domain-specific data, the ability to rapidly create and deploy fine-tuned models is key to efficiently...]]>

For organizations adapting AI foundation models with domain-specific data, the ability to rapidly create and deploy fine-tuned models is key to efficiently delivering value with enterprise generative AI applications. NVIDIA NIM offers prebuilt, performance-optimized inference microservices for the latest AI foundation models, including seamless deployment of models customized using parameter…

]]> Amr Elmeleegy <![CDATA[5x Faster Time to First Token with NVIDIA TensorRT-LLM KV Cache Early Reuse]]> http://www.open-lab.net/blog/?p=91625 2025-05-01T18:34:40Z 2024-11-08T23:55:43Z

In our previous blog post, we demonstrated how reusing the key-value (KV) cache by offloading it to CPU memory can accelerate time to first token (TTFT) by up...]]>

In our previous blog post, we demonstrated how reusing the key-value (KV) cache by offloading it to CPU memory can accelerate time to first token (TTFT) by up to 14x on x86-based NVIDIA H100 Tensor Core GPUs and 28x on the NVIDIA GH200 Superchip. In this post, we shed light on KV cache reuse techniques and best practices that can drive even further TTFT speedups. LLM models are rapidly…

]]> Anton Korzh <![CDATA[3x Faster AllReduce with NVSwitch and TensorRT-LLM MultiShot]]> http://www.open-lab.net/blog/?p=91412 2025-05-01T18:34:34Z 2024-11-01T22:00:36Z

Deploying generative AI workloads in production environments where user numbers can fluctuate from hundreds to hundreds of thousands �C and where input...]]>

Deploying generative AI workloads in production environments where user numbers can fluctuate from hundreds to hundreds of thousands – and where input sequence lengths differ with each request – poses unique challenges. To achieve low latency inference in these environments, multi-GPU setups are a must – irrespective of the GPU generation or its memory capacity. To enhance inference performance in…

]]> 1 Charu Chaubal <![CDATA[Enhanced Security and Streamlined Deployment of AI Agents with NVIDIA AI Enterprise]]> http://www.open-lab.net/blog/?p=90647 2024-11-27T18:39:53Z 2024-10-29T16:00:00Z

AI agents are emerging as the newest way for organizations to increase efficiency, improve productivity, and accelerate innovation. These agents are more...]]>

AI agents are emerging as the newest way for organizations to increase efficiency, improve productivity, and accelerate innovation. These agents are more advanced than prior AI applications, with the ability to autonomously reason through tasks, call out to other tools, and incorporate both enterprise data and employee knowledge to produce valuable business outcomes. They’re being embedded into…

]]> ��˳��97caoporen��