Powering the Next Frontier of Networking for AI Platforms with NVIDIA DOCA 3.0

The NVIDIA DOCA framework has evolved to become a vital component of next-generation AI infrastructure. From its initial release to the highly anticipated launch of NVIDIA DOCA 3.0, each version has expanded capabilities for NVIDIA BlueField DPUs and ConnectX SuperNICs, enabling unprecedented AI platform scalability and performance.

DOCA leverages BlueField DPUs and SuperNICs through a rich ecosystem of libraries and services, enabling hyperscale deployments that exceed 100K GPUs while maintaining strict tenant isolation and optimized resource utilization. The security features of DOCA provide hardware-level threat detection for containerized AI workloads without performance penalties. The intelligent data acceleration capabilities of DOCA address critical bottlenecks in AI data pipelines, while its orchestration features simplify deployment of complex, DPU-accelerated services.

This post introduces DOCA 3.0, which represents the culmination of these advancements. Offering new and improved infrastructure services for AI factories as well as an optimised framework for AI data center infrastructure, DOCA 3.0 provides developers the essential tools required to build secure, efficient AI infrastructure at unprecedented scale. With a broad and thriving developer community already leveraging DOCA; this technology continues transforming how organizations deploy, manage and orchestrate the infrastructure powering tomorrow’s AI innovations.

Introducing DOCA 3.0

In today’s rapidly evolving AI landscape, the infrastructure supporting large-scale AI deployments is as critical as the models themselves. As organizations scale from experimental AI projects to production-ready deployments, the underlying compute, networking, and storage infrastructure must evolve to meet unprecedented demands. At the heart of this evolution lies DOCA, which is revolutionizing how developers build, deploy, and manage next-generation AI platforms.

The latest release, DOCA 3.0, provides developers with extensive libraries, drivers, and APIs to create high-performance applications and services for NVIDIA BlueField DPUs and Connect-X SuperNICs. This innovative framework enables the offloading of resource-intensive tasks from CPUs to dedicated hardware accelerators, dramatically improving performance, security, and efficiency across AI workloads.

Highlights of DOCA 3.0 include:

DOCA support for InfiniBand Quantum-X800 plus ConnectX-8 SuperNICs (GA)
New DOCA Argus Service for NIM container threat detection
DOCA Platform Framework (DPF) trusted host use case (GA)
DOCA SNAP Virtio-fs (Beta) File System Emulation using BlueField-3
DOCA Perftest (GA) RDMA benchmark tool for AI compute clusters

For full details, see the DOCA 3.0 release notes.

A diagrammatic representation of NVIDIA DOCA, highlighting the various services, libraries and drivers which make up the DOCA framework. — *Figure 1. NVIDIA DOCA 3.0 stack*

Hyperscale GPU computing: Scaling multitenant AI factories

The race to build larger AI models with more parameters and training data has pushed computing requirements to unprecedented levels. Modern AI factories must support massive-scale deployments spanning tens of thousands of GPUs while maintaining strict performance isolation between tenants.

DOCA addresses this challenge through its networking libraries that enable efficient resource utilization and workload isolation in multitenant environments. Specifically, the DOCA RDMA Library provides high-performance, low-latency communication capabilities essential for large-scale distributed AI training. This library enables direct memory access between nodes without CPU involvement, significantly reducing communication overhead in multi-GPU systems.

The DOCA GPUNetIO Library further enhances GPU-to-GPU communication by providing direct data paths between GPUs across the network, by means of GPUDirect Async Kernal-initiated communication (GDAKI), enabling efficient collective operations critical for distributed training algorithms. Working in conjunction with DOCA Ethernet, DOCA RDMA or DOCA DMA, these libraries create a high-performance networking foundation that can scale to support deployments beyond 100K GPUs.

Traditional software-defined data center approaches can consume 30% or more of server CPU cores. By offloading these functions to BlueField DPUs through libraries like DOCA Flow, DOCA liberates valuable CPU resources for AI computations while providing performance equivalent to more than 30 CPU cores. The DOCA Flow Library enables sophisticated packet processing and flow management, supporting the complex traffic patterns in large-scale AI factories.

Multitenant isolation for AI workloads

DOCA architecture provides robust isolation mechanisms through its Host-Based Networking service, which ensures workloads from different tenants remain securely separated. This service implements hardware-enforced barriers between tenant environments, preventing unauthorized access while allowing seamless AI execution. This capability is essential for cloud providers and enterprises running sensitive AI workloads alongside other applications.

Robust threat detection: Protecting AI workloads in real time

As AI systems become more critical to business operations, securing them against threats becomes paramount. DOCA unlocks the cybersecurity potential of BlueField DPUs and SuperNICs, enabling rapid creation and integration of applications that offload and accelerate security tasks including encryption, distributed firewalls, intrusion detection, and network microsegmentation.

Using a combination of DOCA libraries, the NVIDIA cybersecurity AI platform leverages hardware-level inspection to provide deep visibility into network traffic and system behaviors. Unlike conventional security solutions that rely on software agents, BlueField DPUs act as embedded security processors that offload critical cybersecurity tasks from traditional CPUs. This approach allows for real-time monitoring and protection without impacting system performance.

DOCA-powered security applications can:

Continuously analyze telemetry data to identify patterns and anomalies indicating potential threats
Deliver real-time threat detection through AI-driven anomaly detection
Proactively mitigate risks before they escalate into major security incidents
Implement robust encryption and secure communication channels between AI components

For AI workloads, this security architecture provides significant advantages. AI models and training data often represent valuable intellectual property, making them prime targets for attacks. DOCA enables confidential computing capabilities that preserve the confidentiality and integrity of AI models, algorithms, and data deployed on NVIDIA Blackwell and NVIDIA Hopper GPUs.

Additionally, DOCA security features address the complex threat landscape created by the convergence of IT and OT systems in AI-driven environments. By providing hardware-accelerated security functions, DOCA ensures that security measures don’t become a performance bottleneck for AI applications.

Accelerating data processing for next-wave AI

Data processing represents one of the most significant challenges in modern AI workflows. The latest generation of AI models demands unprecedented volumes of training data, placing enormous pressure on storage and networking infrastructure.

DOCA addresses this challenge through its comprehensive data acceleration capabilities. The framework’s data path accelerator leverages the BlueField-3 DPA programming subsystem to provide a programming model for offloading communication-centric user code to run on dedicated DPA processors. This offloading capability significantly reduces CPU overhead while increasing performance through DPU acceleration.

In addition, for optimizing data compression in AI pipelines, the DOCA Compress Library delivers hardware-accelerated compression and decompression capabilities. This functionality reduces data transfer times and storage requirements without adding computational load to CPUs or GPUs. Similarly, the DOCA Erasure Coding Library provides resilient data storage capabilities critical for protecting valuable AI datasets.

Also addressing the challenges associated with modern AI workflows, high-performance networking for AI data pipelines is enabled through the DOCA Flow Library, which provides sophisticated packet processing capabilities for optimizing data movement across the network. The DOCA Rivermax Library further enhances networking performance by providing advanced features for streamlined data transfer between storage systems and compute nodes.

Optimizing network performance for AI data pipelines

DOCA Host-Based Networking (HBN) 3.0 offers impressive scalability improvements for controller-less VPC networking, supporting up to 8K VTEPs and 80K Type-5 routes, with plans to increase support for 16K VTEPs and more. In addition, DOCA 3.0 introduces two additional features for HBN including; Bidirectional Forwarding Detection (BFD) support [GA], providing quick route convergence with proactive link monitoring, and ECMP failover enhancements, ensuring minimal downtime with faster failovers.

Collectively, these features make HBN ideal for bare metal deployments and enable AI platforms to handle massive data flows between storage systems, compute nodes, and external data sources.

For AI developers, DOCA intelligent data platform capabilities translate to:

Reduced data processing latency for training and inference pipelines
Higher throughput for data-intensive AI operations
More efficient resource utilization across computing and storage infrastructure
Support for emerging standards like IPMX for audiovisual AI applications

The enhanced DOCA FireFly service brings advanced time synchronization capabilities through hardware acceleration, providing high-precision synchronization essential for distributed AI training workloads. This feature enables more efficient coordination across GPU clusters, particularly important for techniques like large batch training and model parallelism.

Seamless DPU-powered infrastructure service management

The complexity of modern AI infrastructure demands sophisticated orchestration capabilities. DOCA Platform Framework (DPF) GA for trusted hosts in DOCA 3.0, extends Kubernetes control plane functionality to DPUs, enabling administrators to deploy and orchestrate both NVIDIA DOCA services and third-party applications.

DOCA Services are containerized DOCA-based products, wrapped in containers for fast and easy deployment on BlueField DPUs. These services leverage DPU capabilities to offer telemetry, time synchronization, networking solutions, and more, all available through the NGC catalog.

A diagrammatic representation of NVIDIA DPF and NVIDIA DOCA and how they interact with Kubernetes and NVIDIA NIMs microservices. — *Figure 2. DOCA Platform Framework stack (GA)*

By introducing a dedicated, secondary Kubernetes control plane, DPF empowers admins to efficiently manage DOCA services deployed on BlueField DPUs. The framework abstracts the complexity of DPU management, enabling administrators to interact with familiar Kubernetes constructs. This approach significantly simplifies the deployment and operation of AI infrastructure services.

The DPF service function chaining capabilities enable the integration of multiple services, such as accelerated networking, high-performance data services, and security functions on a single DPU. This orchestration capability creates a flexible, multivendor ecosystem for delivering accelerated network services to AI applications.

Real-world deployments demonstrate the tangible benefits of this approach. The integration of NVIDIA DOCA Platform Framework with Red Hat OpenShift has shown significant performance improvements, with RDMA tests reaching 383.72 Gb/sec average bandwidth. This level of networking performance is essential for data-intensive AI workloads like LLMs.

For AI platform operators, DOCA infrastructure service orchestration capabilities provide:

Simplified deployment and management of complex, AI-optimized infrastructure
Robust lifecycle management for seamless service updates, scaling, and rollbacks
Predeployment verification to ensure compatibility and requirements are met
Real-time monitoring and debuggability for high reliability

Accelerate and secure NVIDIA NIM microservices and AI workloads

Leveraging the advanced orchestration features of the DOCA Platform Framework; DOCA HBN, OVS-DOCA, DOCA SNAP Virtio-fs, and the newest service, NVIDIA DOCA Argus, are combined to accelerate and secure NVIDIA NIM microservices and AI workloads. This highlights the relevance of the evolving value of DOCA and offers a glimpse as to how future solutions will continue to emerge from the framework.

DOCA Argus is a cybersecurity framework designed to protect AI factories by delivering agentless, real-time threat detection on BlueField DPUs. Operating independently of the host system, Argus detects and responds to attacks up to 1,000x faster than traditional solutions, without impacting performance.

It integrates seamlessly with enterprise security systems, providing continuous monitoring and automated threat mitigation. Leveraging advanced memory forensics and actionable intelligence, Argus is optimized to secure containerized and multitenant AI workloads at scale.

In conjunction with OVS-DOCA and DOCA SNAP Virtio-fs, DOCA Argus forms an innovative security solution for AI workloads on NVIDIA BlueField DPUs, addressing distinct infrastructure layers while enabling cross-component threat mitigation.

A diagrammatic representation showing how the DOCA Platform Framework orchestrates DOCA HBN, OVS-DOCA, DOCA SNAP Virtio-fs, and the newest service, NVIDIA DOCA Argus, to accelerate and secure NVIDIA NIM microservices and AI workloads. — *Figure 3. Accelerate and secure NIM microservices and AI workloads with DOCA 3.0*

DOCA Argus (compute layer) monitors AI workloads through DPU-level memory and process analysis, relying on OVS-DOCA to offload and isolate network traffic (networking layer). Simultaneously, DOCA SNAP Virtio-fs (storage layer) virtualizes filesystem access through DPU-emulated Virtio devices, isolating storage IO from host kernels and providing Argus with audit logs for anomalous access patterns.

This integrated framework embeds security into compute, network, and storage layers, enabling submillisecond threat response for NIM microservices while maintaining scalability for containerized AI pipelines.

Get started with DOCA 3.0

As AI continues to transform industries, the infrastructure supporting it must evolve accordingly. NVIDIA DOCA Framework represents a fundamental shift in how developers build and deploy AI platforms, offering unprecedented performance, security, and efficiency through its comprehensive set of libraries and services.

The DOCA SDK is built around different DOCA libraries designed to leverage the capabilities of BlueField DPUs. With over 20 specialized libraries, developers have access to a powerful toolkit for building optimized AI infrastructure.

DOCA Services complement these libraries by providing containerized solutions for specific use cases. Find them through the NGC catalog with labels such as DOCA and DPU. This containerized approach enables rapid deployment and simplified management of infrastructure components critical for AI operations.

The continuing evolution of DOCA, with regular framework updates and new capabilities, ensures that developers can stay at the forefront of AI infrastructure innovation. With thousands of developers already leveraging DOCA, the ecosystem continues to grow, driving new possibilities for AI application development.

For developers looking to build the next generation of AI platforms, NVIDIA DOCA provides the comprehensive toolkit needed to harness the full potential of BlueField DPUs and Connect-X SuperNICs and create infrastructure that can scale to meet the demands of tomorrow’s AI workloads. By embracing DOCA, organizations can position themselves at the cutting edge of AI infrastructure innovation, ready to power the next frontier of AI.

NVIDIA DOCA 3.0 marks significant advancements in both AI compute fabric and cloud computing infrastructure. Download NVIDIA DOCA to begin your development journey with all the benefits DOCA has to offer.

Powering the Next Frontier of Networking for AI Platforms with NVIDIA DOCA 3.0

Introducing DOCA 3.0

Hyperscale GPU computing: Scaling multitenant AI factories

Multitenant isolation for AI workloads

Robust threat detection: Protecting AI workloads in real time

Accelerating data processing for next-wave AI

Optimizing network performance for AI data pipelines

Seamless DPU-powered infrastructure service management

Accelerate and secure NVIDIA NIM microservices and AI workloads

Get started with DOCA 3.0

Related resources

Tags

About the Authors

Powering the Next Frontier of Networking for AI Platforms with NVIDIA DOCA 3.0

Introducing DOCA 3.0

Hyperscale GPU computing: Scaling multitenant AI factories

Multitenant isolation for AI workloads

Robust threat detection: Protecting AI workloads in real time

Accelerating data processing for next-wave AI

Optimizing network performance for AI data pipelines

Seamless DPU-powered infrastructure service management

Accelerate and secure NVIDIA NIM microservices and AI workloads

Get started with DOCA 3.0

Related resources

Tags

About the Authors

Comments

Related posts

NVIDIA DOCA 2.9 Enhances AI and Cloud Computing Infrastructure with New Performance and Security Features

Delivering Efficient, High-Performance AI Clouds with NVIDIA DOCA 2.5

NVIDIA and Snowflake Collaboration Boosts Data Cloud AI Capabilities

Transform the Data Center for the AI Era with NVIDIA DPUs and NVIDIA DOCA

Data Center Networking - Top Resources from GTC 21

Related posts

Understanding NCCL Tuning to Accelerate GPU-to-GPU Communication

Automating Network Design in NVIDIA Air with Ansible and Git

CUTLASS 3.x: Orthogonal, Reusable, and Composable Abstractions for GEMM Kernel Design

CUTLASS: Principled Abstractions for Handling Multidimensional Data Through Tensors and Spatial Microkernels

Enabling Fast Inference and Resilient Training with NCCL 2.27