The first release of NVIDIA NIM Operator simplified the deployment and lifecycle management of inference pipelines for NVIDIA NIM microservices, reducing the workload for MLOps, LLMOps engineers, and Kubernetes admins. It enabled easy and fast deployment, auto-scaling, and upgrading of NIM on Kubernetes clusters. Learn more about the first release. Our customers and partners have been using��
]]>Today, NVIDIA announced the open-source release of the KAI Scheduler, a Kubernetes-native GPU scheduling solution, now available under the Apache 2.0 license. Originally developed within the Run:ai platform, KAI Scheduler is now available to the community while also continuing to be packaged and delivered as part of the NVIDIA Run:ai platform. This initiative underscores NVIDIA��s commitment to��
]]>At NVIDIA, we take pride in tackling complex infrastructure challenges with precision and innovation. When Volcano faced GPU underutilization in their NVIDIA DGX Cloud-provisioned Kubernetes cluster, we stepped in to deliver a solution that not only met but exceeded expectations. By combining advanced scheduling techniques with a deep understanding of distributed workloads��
]]>Advanced AI models such as DeepSeek-R1 are proving that enterprises can now build cutting-edge AI models specialized with their own data and expertise. These models can be tailored to unique use cases, tackling diverse challenges like never before. Based on the success of early AI adopters, many organizations are shifting their focus to full-scale production AI factories. Yet the process of��
]]>Attending KubeCon? Meet NVIDIA at booth S750, join our startup mixer, or stop by our 15+ sessions.
]]>NVIDIA Holoscan for Media is an NVIDIA-accelerated platform designed for multi-vendor live production and AI. It will be showcased at GTC, highlighting NVIDIA NIM, AI SDKs, and microservices that enhance live production workflows. Built on Kubernetes, the container orchestration platform simplifies media timing, synchronization, and management through NVIDIA components such as the GPU and��
]]>As of 3/18/25, NVIDIA Triton Inference Server is now NVIDIA Dynamo. NVIDIA NIM microservices are model inference containers that can be deployed on Kubernetes. In a production environment, it��s important to understand the compute and memory profile of these microservices to set up a successful autoscaling plan. In this post, we describe how to set up and use Kubernetes Horizontal Pod��
]]>Organizations are increasingly turning to accelerated computing to meet the demands of generative AI, 5G telecommunications, and sovereign clouds. NVIDIA has unveiled the DOCA Platform Framework (DPF), providing foundational building blocks to unlock the power of NVIDIA BlueField DPUs and optimize GPU-accelerated computing platforms. Serving as both an orchestration framework and an implementation��
]]>As of 3/18/25, NVIDIA Triton Inference Server is now NVIDIA Dynamo. The demand for AI-enabled services continues to grow rapidly, placing increasing pressure on IT and infrastructure teams. These teams are tasked with provisioning the necessary hardware and software to meet that demand while simultaneously balancing cost efficiency with optimal user experience. This challenge was faced by the��
]]>As of 3/18/25, NVIDIA Triton Inference Server is now NVIDIA Dynamo. Large language models (LLMs) have been widely used for chatbots, content generation, summarization, classification, translation, and more. State-of-the-art LLMs and foundation models, such as Llama, Gemma, GPT, and Nemotron, have demonstrated human-like understanding and generative abilities. Thanks to these models��
]]>The rapid evolution of AI models has driven the need for more efficient and scalable inferencing solutions. As organizations strive to harness the power of AI, they face challenges in deploying, managing, and scaling AI inference workloads. NVIDIA NIM and Google Kubernetes Engine (GKE) together offer a powerful solution to address these challenges. NVIDIA has collaborated with Google Cloud to��
]]>In the rapidly evolving landscape of AI and data science, the demand for scalable, efficient, and flexible infrastructure has never been higher. Traditional infrastructure can often struggle to meet the demands of modern AI workloads, leading to bottlenecks in development and deployment processes. As organizations strive to deploy AI models and data-intensive applications at scale��
]]>Developers have shown a lot of excitement for NVIDIA NIM microservices, a set of easy-to-use cloud-native microservices that shortens the time-to-market and simplifies the deployment of generative AI models anywhere, across cloud, data centers, cloud, and GPU-accelerated workstations. To meet the demands of diverse use cases, NVIDIA is bringing to market a variety of different AI models��
]]>Step-by-step guide to build robust, scalable RAG apps with Haystack and NVIDIA NIMs on Kubernetes.
]]>As large language models (LLMs) continue to gain traction in enterprise AI applications, the demand for custom models that can understand and integrate specific industry terminology, domain expertise, and unique organizational requirements becomes increasingly important. To address this growing need for customizing LLMs, the NVIDIA NeMo team has announced an early access program for NeMo��
]]>Generative AI has the potential to transform every industry. Human workers are already using large language models (LLMs) to explain, reason about, and solve difficult cognitive tasks. Retrieval-augmented generation (RAG) connects LLMs to data, expanding the usefulness of LLMs by giving them access to up-to-date and accurate information. Many enterprises have already started to explore how��
]]>NVIDIA Holoscan for Media is a software-defined platform for building and deploying applications for live media. Recent updates introduce a user-friendly developer interface and new capabilities for application deployment to the platform. Holoscan for Media now includes Helm Dashboard, which delivers an intuitive user interface for orchestrating and managing Helm charts.
]]>The broadcast industry is undergoing a transformation in how content is created, managed, distributed, and consumed. This transformation includes a shift from traditional linear workflows bound by fixed-function devices to flexible and hybrid, software-defined systems that enable the future of live streaming. Developers can now apply to join the early access program for NVIDIA Holoscan for��
]]>The Dataiku platform for everyday AI simplifies deep learning. Use cases are far-reaching, from image classification to object detection and natural language processing (NLP). Dataiku helps you with labeling, model training, explainability, model deployment, and centralized management of code and code environments. This post dives into high-level Dataiku and NVIDIA integrations for image��
]]>NVIDIA Base Command Platform provides the capabilities to confidently develop complex software that meets the performance standards required by scientific computing workflows. The platform enables both cloud-hosted and on-premises solutions for AI development by providing developers with the tools needed to efficiently configure and manage AI workflows. Integrated data and user management simplify��
]]>Speech AI applications, from call centers to virtual assistants, rely heavily on automatic speech recognition (ASR) and text-to-speech (TTS). ASR can process the audio signal and transcribe the audio to text. Speech synthesis or TTS can generate high-quality, natural-sounding audio from the text in real time. The challenge of Speech AI is to achieve high accuracy and meet the latency requirements��
]]>Many organizations today run applications in containers to take advantage of the powerful orchestration and management provided by cloud-native platforms based on Kubernetes. However, virtual machines continue to remain as the predominant data center infrastructure platform for enterprises, and not all applications can be easily modified to run in containers. For example��
]]>For scalable data center performance, NVIDIA GPUs have become a must-have. NVIDIA GPU parallel processing capabilities, supported by thousands of computing cores, are essential to accelerating a wide variety of applications across different industries. The most compute-intensive applications across diverse industries use GPUs today: Different applications across this spectrum can��
]]>The IT world is moving to cloud, and cloud is built on containers managed with Kubernetes. We believe the next logical step is to accelerate this infrastructure with data processing units (DPUs) for greater performance, efficiency, and security. Red Hat and NVIDIA are building an integrated cloud-ready infrastructure solution with the management and automation of Red Hat OpenShift combined��
]]>The new year has been off to a great start with NVIDIA AI Enterprise 1.1 providing production support for container orchestration and Kubernetes cluster management using VMware vSphere with Tanzu 7.0 update 3c, delivering AI/ML workloads to every business in VMs, containers, or Kubernetes. New NVIDIA AI Enterprise labs for IT admins and MLOps are available on NVIDIA LaunchPad��
]]>Deploying AI-powered services like voice-based assistants, e-commerce product recommendations, and contact-center automation into production at scale is challenging. Delivering the best end-user experience while reducing operational costs requires accounting for multiple factors. These include composition and performance of underlying infrastructure, flexibility to scale resources based on user��
]]>Editor��s note: Interested in GPU Operator? Register for our upcoming webinar on January 20th, ��How to Easily use GPUs with Kubernetes��. NVIDIA GPU Operator allows organizations to easily scale NVIDIA GPUs on Kubernetes. By simplifying the deployment and management of GPUs with Kubernetes, the GPU Operator enables infrastructure teams to scale GPU applications error-free, within minutes��
]]>Join the NVIDIA Triton and NVIDIA TensorRT community to stay current on the latest product updates, bug fixes, content, best practices, and more. As of 3/18/25, NVIDIA Triton Inference Server is now NVIDIA Dynamo. NVIDIA Triton Inference Server is an open-source AI model serving software that simplifies the deployment of trained AI models at scale in production.
]]>Join the NVIDIA Triton and NVIDIA TensorRT community to stay current on the latest product updates, bug fixes, content, best practices, and more. The rapid growth in artificial intelligence is driving up the size of data sets, as well as the size and complexity of networks. AI-enabled applications like e-commerce product recommendations, voice-based assistants, and contact center automation��
]]>Editor��s note: Interested in GPU Operator? Register for our upcoming webinar on January 20th, ��How to Easily use GPUs with Kubernetes��. In the last post, we looked at how the GPU Operator has evolved, adding a rich feature set to handle GPU discovery, support for the new Multi-Instance GPU (MIG) capability of the NVIDIA Ampere Architecture, vGPU, and certification for use with Red Hat OpenShift.
]]>The NGC team is hosting a webinar and live Q&A. Topics include how to use containers from the NGC catalog deployed from Google Cloud Marketplace to GKE, a managed Kubernetes service on Google Cloud, that easily builds, deploys, and runs AI solutions. Building a Computer Vision Service Using NVIDIA NGC and Google Cloud August 25 at 10 a.m. PT Organizations are using computer vision to��
]]>Kubernetes is an open-source container-orchestration system for automating computer application deployment, scaling, and management. It��s an extremely popular tool, and can be used for automated rollouts and rollbacks, horizontal scaling, storage orchestration, and more. For many organizations, Kubernetes is a key component to their infrastructure. A critical step to installing and scaling��
]]>Using the same orchestration on-premise and on the public cloud allows a high level of agility and ease of operations. You can use the same API across bare metal and public clouds. Kubernetes is an open-source, container-orchestration system for automating the deployment, scaling, and management of containerized applications. It was originally designed by Google and is now maintained by the Cloud��
]]>The growing prevalence of GPU-accelerated computing in the cloud, enterprise, and at the edge increasingly relies on robust and powerful network infrastructures. NVIDIA ConnectX SmartNICs and NVIDIA BlueField DPUs provide high-throughput, low-latency connectivity that enables the scaling of GPU resources across a fleet of nodes. To address the demand for cloud-native AI workloads��
]]>Editor��s note: Interested in GPU Operator? Register for our upcoming webinar on January 20th, ��How to Easily use GPUs with Kubernetes��. Reliably provisioning servers with GPUs in Kubernetes can quickly become complex as multiple components must be installed and managed to use GPUs. The GPU Operator, based on the Operator Framework, simplifies the initial deployment and management of GPU��
]]>Join the NVIDIA Triton and NVIDIA TensorRT community to stay current on the latest product updates, bug fixes, content, best practices, and more. High-energy physics research aims to understand the mysteries of the universe by describing the fundamental constituents of matter and the interactions between them. Diverse experiments exist on Earth to re-create the first instants of the universe.
]]>Join the NVIDIA Triton and NVIDIA TensorRT community to stay current on the latest product updates, bug fixes, content, best practices, and more. AI machine learning is unlocking breakthrough applications in fields such as online product recommendations, image classification, chatbots, forecasting, and manufacturing quality inspection. There are two parts to AI: training and inference.
]]>Editor��s note: Interested in GPU Operator? Register for our upcoming webinar on January 20th, ��How to Easily use GPUs with Kubernetes��. For many years, was the only container runtime supported by Kubernetes. Over time, support for other runtimes has not only become possible but often preferred, as standardization around a common container runtime interface (CRI) has solidified in the��
]]>Editor��s note: Interested in GPU Operator? Register for our upcoming webinar on January 20th, ��How to Easily use GPUs with Kubernetes��. Reliably provisioning servers with GPUs can quickly become complex as multiple components must be installed and managed to use GPUs with Kubernetes. The GPU Operator simplifies the initial deployment and management and is based on the Operator Framework.
]]>In the world of machine learning, models are trained using existing data sets and then deployed to do inference on new data. In a previous post, Simplifying and Scaling Inference Serving with NVIDIA Triton 2.3, we discussed inference workflow and the need for an efficient inference serving solution. In that post, we introduced Triton Inference Server and its benefits and looked at the new features��
]]>Conversational AI solutions such as chatbots are now deployed in the data center, on the cloud, and at the edge to deliver lower latency and high quality of service while meeting an ever-increasing demand. The strategic decision to run AI inference on any or all these compute platforms varies not only by the use case but also evolves over time with the business. Hence��
]]>AI, machine learning (ML), and deep learning (DL) are effective tools for solving diverse computing problems such as product recommendations, customer interactions, financial risk assessment, manufacturing defect detection, and more. Using an AI model in production, called inference serving, is the most complex part of incorporating AI in applications. Triton Inference Server takes care of all the��
]]>Edge computing takes place close to the data source to reduce network stress and improve latency. GPUs are an ideal compute engine for edge computing because they are programmable and deliver phenomenal performance per dollar. However, the complexity associated with managing a fleet of edge devices can erode the GPU��s favorable economics. In 2019, NVIDIA introduced the GPU Operator to��
]]>Seamlessly deploying AI services at scale in production is as critical as creating the most accurate AI model. Conversational AI services, for example, need multiple models handling functions of automatic speech recognition (ASR), natural language understanding (NLU), and text-to-speech (TTS) to complete the application pipeline. To provide real-time conversation to users��
]]>Modern expectations for agile capabilities and constant innovation��with zero downtime��calls for a change in how software for embedded and edge devices are developed and deployed. Adopting cloud-native paradigms like microservices, containerization, and container orchestration at the edge is the way forward but complexity of deployment, management, and security concerns gets in the way of scaling.
]]>This post was originally published on the Mellanox blog. In my previous Kubernetes post, Provision Bare-Metal Kubernetes Like a Cloud Giant!, I discussed the benefits of using BlueField DPU-programmable SmartNICs to simplify provisioning of Kubernetes clusters in bare-metal infrastructures. A key takeaway from this post was the current rapid shift toward bare metal Kubernetes��
]]>Editor��s note: Interested in GPU Operator? Register for our upcoming webinar on January 20th, ��How to Easily use GPUs with Kubernetes��. Over the last few years, NVIDIA has leveraged GPU containers in a variety of ways for testing, development and running AI workloads in production at scale. Containers optimized for NVIDIA GPUs and systems such as the DGX and OEM NGC-Ready servers are available��
]]>Data collected on a vast scale has fundamentally changed the way organizations do business, driving demand for teams to provide meaningful data science, machine learning, and deep learning-based business insights quickly. Data science leaders, plus the Dev Ops and IT teams supporting them, constantly look for ways to make their teams productive while optimizing their costs and minimizing��
]]>The software industry has recently seen a huge shift in how software deployments are done thanks to technologies such as containers and orchestrators. While container technologies have been around, credit goes to Docker for making containers mainstream, by greatly simplifying the process of creating, managing and deploying containerized applications. We��re now seeing a similar paradigm shift for��
]]>For the latest information about how to deploy Kubernetes on NVIDIA GPUs, see the Kubernetes section of the NVIDIA Data Center documentation. NVIDIA GPU Cloud (NGC) provides access to a number of containers for deep learning, HPC, and HPC visualization, as well as containers with applications from our NVIDIA partners �C all optimized for NVIDIA GPUs and DGX systems.
]]>