Turbocharging AI Factories with DPU-Accelerated Service Proxy for Kubernetes

As AI evolves to planning, research, and reasoning with agentic AI, workflows are becoming increasingly complex. To deploy agentic AI applications efficiently, AI clouds need a software-defined, hardware-accelerated application delivery and security platform (ADSP). That enables dynamic load balancing, robust security, cloud-native multi-tenancy, and rich observability. F5 BIG-IP Next for Kubernetes, powered by NVIDIA BlueField-3 data processing units (DPUs), streamlines Kubernetes application delivery and deployment of agentic AI while reducing the total cost of ownership due to operational efficiency and optimal power consumption.

AI has advanced rapidly since the introduction of OpenAI’s ChatGPT in 2022. Initially, AI was focused on model training using GPUs to process large datasets and optimize performance. Today, the focus has expanded to distributed inferencing, with large language models (LLMs) answering queries, integrating enterprise data through retrieval-augmented generation (RAG), and developing reasoning models like DeepSeek R1.

Agentic AI now takes generative AI to the next level. Instead of the single-shot approach LLMs employ in answering a question, agentic AI solves a complex problem through planning and reasoning. As an example of an agentic AI, NVIDIA’s digital human blueprint workflow below incorporates over a dozen containerized NVIDIA Inference Microservices (NIM), including LLMs, vector databases, RAG, speech recognition, and avatar rendering. These components work together to create a cohesive agentic workflow.

A diagram of an agentic AI workflow, showing interconnections between 20 different system components — *Figure 1. Agentic AI workflow*

Agentic workflows—planning, reasoning, test time scaling, and long thinking—are even more complex. Because they use many components and data stores within a data center or across multiple data centers, implementing agentic AI on a single node becomes impractical. Agentic AI inferencing requires distributed and disaggregated multi-node infrastructure that consists of accelerated compute, networking, and storage to handle constant data movement between the agentic AI system components.

The BlueField-3 DPU is key to optimizing AI data movements in AI clouds and AI factories. BlueField is an accelerated networking platform, combining high-performance and programmable acceleration engines with power-efficient Arm compute cores. This combination provides performance, efficiency, and flexibility in programming agentic AI data flows between the interconnected components.

To simplify the deployment and operations for AI factories, NVIDIA has developed a reference architecture for sovereign AI cloud operators, also known as NVIDIA Cloud Partners (NCP). BlueField is a key component for this reference architecture as it efficiently handles the north-south networking—including inter-cluster traffic and storage access—for GPU clusters.

A diagram of NVIDIA Cloud Partners reference architecture. It showcases an accelerated compute infrastructure stack and per-tenant infrastructure. BlueField-3 is a key element for enabling the north-south ethernet fabric. — *Figure 2. Reference architecture for NCPs*

Introducing F5 BIG-IP Next for Kubernetes

Optimized data-center infrastructure is crucial for AI clouds and AI factories, and so is a high-performant and efficient application delivery controller (ADC). F5’s BIG-IP Next for Kubernetes (BNK) provides dynamic load balancing, robust security, cloud-native multi-tenancy, and rich observability for AI factories. BNK, accelerated with BlueField-3, enables high-performance cloud-native networking, and zero-trust security at scale for AI clouds, streamlining agentic AI deployment and operations.

Kubernetes promises easy scalability and monitoring of cloud-native applications but often results in complexities. Deploying microservices in Kubernetes involves numerous elements—such as ingress and egress controllers, micro-segmentation, network policy management, identity management, API policies, and service meshes—that make it difficult to align data flows with applications. And agentic AI deployment is complex, as it relies on multiple microservices deployed across diverse environments. Additionally, AI clouds face the challenge of granular partitioning of GPU resources while effectively tracking usage per customer.

NCP and sovereign AI cloud providers need cloud-native multi-tenancy to efficiently utilize GPU resources across multiple customers, rather than overprovisioning them for each customer. BNK, accelerated with BlueField-3, learns and routes traffic to Kubernetes namespaces, thus providing true cloud-native load balancing.

Two blocks show GPU clusters with and without BIG-IP Next for Kubernetes.The block on the left without BIG-IP Next for Kubernetes shows underutilized resources. The block on the right with BIG-IP Next for Kubernetes shows maximum utilization of resources. — *Figure 3. Multi-tenancy using Kubernetes Namespaces*

BNK accelerated with BlueField-3 also improves power efficiency by offloading the data path from host CPU servers to power-efficient Arm cores on the DPU while boosting throughput. This translates to much higher network energy efficiency in terms of throughput per watt.

SoftBank’s experience

SoftBank operates two of the 20 largest supercomputers in the world and develops Sarashina, a prominent Japanese LLM. To expand its cloud-native data centers from training to providing scalable AI inference services, SoftBank requires enterprise-grade tenant isolation and security. It must efficiently utilize all available computing resources while minimizing power consumption and maintaining high network performance.

SoftBank tested BNK on a NVIDIA H100 GPU cluster. The proof of concept (PoC) measured networking performance of applications running on two separate Kubernetes namespaces. Network traffic was completely isolated for each tenant namespace.

During the POC, SoftBank generated 100 concurrent HTTP GET requests with 75 Gbps and 18,000 requests per second. Next, SoftBank compared the operational efficiency of BNK accelerated with the BlueField-3 to open source NGINX running on the host CPU. The results were impressive.

Two graphs comparing performance. On the left, SoftBank’s BNK, accelerated by NVIDIA BlueField-3 DPU, handles 100 concurrent HTTP GET requests at 75 Gbps and 18,000 requests/sec, significantly surpassing open source NGINX running on a host CPU. — *Figure 4. BNK with NVIDIA BlueField-3 outperforms open source NGINX.*

CPU offloading: BNK accelerated with BlueField-3 achieved 77 Gbps throughput without consuming any CPU cores vs. open source NGINX as ingress controller delivered 65 Gbps while consuming 30 host cores.?
Latency: HTTP GET response (time to first byte of an L7 request) was 11x lower with BNK powered by BlueField.

Bar chart showing F5 ADC with BlueField-3 DPU achieving 99% lower CPU utilization and 57 Gbps/watt, 190x higher energy efficiency, for north-south AI cloud traffic, compared to 0.3 Gbps/watt with open source NGINX on CPU. — *Figure 4. BINK with NVIDIA BlueField-3 outperforms open-source Nginx.*

CPU utilization: BNK with BlueField showed 99% lower CPU utilization compared to NGINX host software.
Network energy efficiency (measured as throughput per watt): BlueField acceleration delivered 190x higher energy efficiency with 57 Gbps/watt vs. 0.3 Gbps/watt with open source NGINX.

This PoC showed how north-to-south traffic to AI clouds can be efficiently managed using the F5 ADSP accelerated by BlueField-3.

Conclusion

AI clouds and AI factories need cloud-native data centers architected for high performance, power efficiency, cloud-native multi-tenancy, and security. NVIDIA’s collaboration with F5 achieves best-in-class performance, security, and efficiency. SoftBank’s impressive PoC results validate that offloading and accelerating application delivery with DPUs turbocharges AI factories to meet the extreme demands of modern AI workloads.

For more information on SoftBank’s PoC and the capabilities of F5 BIG-IP Next for Kubernetes with BlueField-3 acceleration, please refer to the detailed NVIDIA GTC presentation.

At the RSA conference this year, F5 and NVIDIA announced the general availability of BIG-IP for Kubernetes (BNK) powered by BlueField-3 to address major Kubernetes networking and security challenges for AI clouds. Please contact your F5 or NVIDIA sales representative for demo or PoC inquiries.

Updated on July 24 with F5 BNK branding.