AI factories rely on more than just compute fabrics. While the East-West network connecting the GPUs is critical to AI application performance, the storage fabric��connecting high-speed storage arrays��is equally important. Storage performance plays a key role across several stages of the AI lifecycle, including training checkpointing, inference techniques such as retrieval-augmented generation��
]]>NVIDIA DOCA GPUNetIO is a library within the NVIDIA DOCA SDK, specifically designed for real-time inline GPU packet processing. It combines technologies like GPUDirect RDMA and GPUDirect Async to enable the creation of GPU-centric applications where a CUDA kernel can directly communicate with the network interface card (NIC) for sending and receiving packets, bypassing the CPU and excluding it��
]]>NVIDIA AI Enterprise is an end-to-end, secure, cloud-native suite of AI software. The recent release of NVIDIA AI Enterprise 3.0 introduces new features to help optimize the performance and efficiency of production AI. This post provides details about the new features listed below and how they work. New AI workflows in the 3.0 release of NVIDIA AI Enterprise help reduce the��
]]>Whole slide imaging (WSI), the digitization of tissue on slides using whole slide scanners, is gaining traction in healthcare. WSI enables clinicians in histopathology, immunohistochemistry, and cytology to: This post explains how GPU-accelerated toolkits improve the input/output (I/O) performance and image processing tasks. More specifically, it details how to: Time savings��
]]>Today��s leading-edge high performance computing (HPC) systems contain tens of thousands of GPUs. In NVIDIA systems, GPUs are connected on nodes through the NVLink scale-up interconnect, and across nodes through a scale-out network like InfiniBand. The software libraries that GPUs use to communicate, share work, and efficiently operate in parallel are collectively called NVIDIA Magnum IO��
]]>Clinical applications for AI are improving digital surgery, helping to reduce errors, provide consistency, and enable surgeon augmentations that were previously unimaginable. In endoscopy, a minimally invasive procedure used to examine the interior of an organ or cavity of a body, AI and accelerated computing are enabling better detection rates and visibility.
]]>NVIDIA Clara Holoscan provides a scalable medical device computing platform for developers to create AI microservices and deliver insights in real time. The platform optimizes every stage of the data pipeline: from high-bandwidth data streaming and physics-based analysis to accelerated AI inference, and graphic visualizations. The NVIDIA Clara AGX Developer Kit, which is now available��
]]>This is the fourth post in the Accelerating IO series. It addresses storage issues and shares recent results and directions with our partners. We cover the new GPUDirect Storage release, benefits, and implementation. Accelerated computing needs accelerated IO. Otherwise, computing resources get starved for data. Given that the fraction of all workflows for which data fits in memory is��
]]>Efficient pipeline design is crucial for data scientists. When composing complex end-to-end workflows, you may choose from a wide variety of building blocks, each of them specialized for a dedicated task. Unfortunately, repeatedly converting between data formats is an error-prone and performance-degrading endeavor. Let��s change that! In this post series, we discuss different aspects of��
]]>This is the second post in the Accelerating IO series, which describes the architecture, components, and benefits of Magnum IO, the IO subsystem of the modern data center. The first post in this series introduced the Magnum IO architecture and positioned it in the broader context of CUDA, CUDA-X, and vertical application domains. Of the four major components of the architecture��
]]>This is the first post in the Accelerating IO series, which describes the architecture, components, storage, and benefits of Magnum IO, the IO subsystem of the modern data center. Previously the boundary of the unit of computing, sheet metal no longer constrains the resources that can be applied to a single problem or the data set that can be housed. The new unit is the data center.
]]>Remote Direct Memory Access (RDMA) allows computers to exchange data in memory without the involvement of a CPU. The benefits include low latency and high bandwidth data exchange. GPUDirect RDMA extends the same philosophy to the GPU and the connected peripherals in Jetson AGX Xavier. GPUDirect RDMA enables a direct path for data exchange between the GPU-accessible memory (the CUDA memory) and a��
]]>NVIDIA GPUDirect RDMA is a technology which enables a direct path for data exchange between the GPU and third-party peer devices using standard features of PCI Express. Examples of third-party devices include network interfaces, video acquisition devices, storage adapters, and medical equipment. Enabled on Tesla and Quadro-class GPUs, GPUDirect RDMA relies on the ability of NVIDIA GPUs to expose��
]]>