GPUDirect – NVIDIA Technical Blog

GPUDirect – NVIDIA Technical Blog News and tutorials for developers, data scientists, and IT admins 2025-07-11T15:00:00Z http://www.open-lab.net/blog/feed/ Taylor Allison <![CDATA[Accelerating AI Storage by up to 48% with NVIDIA Spectrum-X Networking Platform and Partners]]> http://www.open-lab.net/blog/?p=95432 2025-04-23T02:48:15Z 2025-02-04T15:00:00Z

AI factories rely on more than just compute fabrics. While the East-West network connecting the GPUs is critical to AI application performance, the storage...]]>

AI factories rely on more than just compute fabrics. While the East-West network connecting the GPUs is critical to AI application performance, the storage...

data-center

AI factories rely on more than just compute fabrics. While the East-West network connecting the GPUs is critical to AI application performance, the storage fabric��connecting high-speed storage arrays��is equally important. Storage performance plays a key role across several stages of the AI lifecycle, including training checkpointing, inference techniques such as retrieval-augmented generation��

]]> 0 Elena Agostini <![CDATA[Unlocking GPU-Accelerated RDMA with NVIDIA DOCA GPUNetIO]]> http://www.open-lab.net/blog/?p=83998 2024-06-27T23:59:16Z 2024-06-13T20:43:59Z

NVIDIA DOCA GPUNetIO is a library within the NVIDIA DOCA SDK, specifically designed for real-time inline GPU packet processing. It combines technologies like...]]>

NVIDIA DOCA GPUNetIO is a library within the NVIDIA DOCA SDK, specifically designed for real-time inline GPU packet processing. It combines technologies like...

NVIDIA DOCA

NVIDIA DOCA GPUNetIO is a library within the NVIDIA DOCA SDK, specifically designed for real-time inline GPU packet processing. It combines technologies like GPUDirect RDMA and GPUDirect Async to enable the creation of GPU-centric applications where a CUDA kernel can directly communicate with the network interface card (NIC) for sending and receiving packets, bypassing the CPU and excluding it��

]]> 4 Shruthii Sathyanarayanan <![CDATA[Optimizing Production AI Performance and Efficiency with NVIDIA AI Enterprise 3.0]]> http://www.open-lab.net/blog/?p=61145 2023-02-25T00:31:02Z 2023-02-22T18:30:00Z

NVIDIA AI Enterprise is an end-to-end, secure, cloud-native suite of AI software. The recent release of NVIDIA AI Enterprise 3.0 introduces new features to help...]]>

NVIDIA AI Enterprise is an end-to-end, secure, cloud-native suite of AI software. The recent release of NVIDIA AI Enterprise 3.0 introduces new features to help... Monitor, data, train, deploy graphic

Monitor, data, train, deploy graphic

NVIDIA AI Enterprise is an end-to-end, secure, cloud-native suite of AI software. The recent release of NVIDIA AI Enterprise 3.0 introduces new features to help optimize the performance and efficiency of production AI. This post provides details about the new features listed below and how they work. New AI workflows in the 3.0 release of NVIDIA AI Enterprise help reduce the��

]]> 0 Gregory Lee <![CDATA[Accelerating Digital Pathology Workflows Using cuCIM and NVIDIA GPUDirect Storage]]> http://www.open-lab.net/blog/?p=59311 2023-01-23T22:43:15Z 2023-01-05T19:00:00Z

Whole slide imaging (WSI), the digitization of tissue on slides using whole slide scanners, is gaining traction in healthcare. WSI enables clinicians in...]]>

Whole slide imaging (WSI), the digitization of tissue on slides using whole slide scanners, is gaining traction in healthcare. WSI enables clinicians in... Digital pathology WSI

Digital pathology WSI

Whole slide imaging (WSI), the digitization of tissue on slides using whole slide scanners, is gaining traction in healthcare. WSI enables clinicians in histopathology, immunohistochemistry, and cytology to: This post explains how GPU-accelerated toolkits improve the input/output (I/O) performance and image processing tasks. More specifically, it details how to: Time savings��

]]> 1 Pak Markthub <![CDATA[Improving Network Performance of HPC Systems Using NVIDIA Magnum IO NVSHMEM and GPUDirect Async]]> http://www.open-lab.net/blog/?p=57629 2022-12-01T19:52:29Z 2022-11-22T17:00:00Z

Today��s leading-edge high performance computing (HPC) systems contain tens of thousands of GPUs. In NVIDIA systems, GPUs are connected on nodes through the...]]>

Today��s leading-edge high performance computing (HPC) systems contain tens of thousands of GPUs. In NVIDIA systems, GPUs are connected on nodes through the...

vasp-magnum-io-featured

Today��s leading-edge high performance computing (HPC) systems contain tens of thousands of GPUs. In NVIDIA systems, GPUs are connected on nodes through the NVLink scale-up interconnect, and across nodes through a scale-out network like InfiniBand. The software libraries that GPUs use to communicate, share work, and efficiently operate in parallel are collectively called NVIDIA Magnum IO��

]]> 3 Jin Li <![CDATA[AI in Endoscopy: Improving Detection Rates and Visibility with Real-Time Sensing]]> http://www.open-lab.net/blog/?p=54235 2023-06-12T09:01:00Z 2022-08-30T20:23:00Z

Clinical applications for AI are improving digital surgery, helping to reduce errors, provide consistency, and enable surgeon augmentations that were previously...]]>

Clinical applications for AI are improving digital surgery, helping to reduce errors, provide consistency, and enable surgeon augmentations that were previously... Frame by frame identification and tracking in endoscopy

Frame by frame identification and tracking in endoscopy

Clinical applications for AI are improving digital surgery, helping to reduce errors, provide consistency, and enable surgeon augmentations that were previously unimaginable. In endoscopy, a minimally invasive procedure used to examine the interior of an organ or cavity of a body, AI and accelerated computing are enabling better detection rates and visibility.

]]> 0 Yaniv Lazimy <![CDATA[Supporting Low-Latency Streaming Video for AI-Powered Medical Devices with Clara Holoscan]]> http://www.open-lab.net/blog/?p=39667 2023-11-02T20:22:10Z 2021-11-15T19:06:18Z

NVIDIA Clara Holoscan provides a scalable medical device computing platform for developers to create AI microservices and deliver insights in real time. The...]]>

NVIDIA Clara Holoscan provides a scalable medical device computing platform for developers to create AI microservices and deliver insights in real time. The... Closeup of a surgeon at work.

Closeup of a surgeon at work.

NVIDIA Clara Holoscan provides a scalable medical device computing platform for developers to create AI microservices and deliver insights in real time. The platform optimizes every stage of the data pipeline: from high-bandwidth data streaming and physics-based analysis to accelerated AI inference, and graphic visualizations. The NVIDIA Clara AGX Developer Kit, which is now available��

]]> 0 CJ Newburn <![CDATA[Accelerating IO in the Modern Data Center: Magnum IO Storage]]> http://www.open-lab.net/blog/?p=35783 2022-08-21T23:52:25Z 2021-08-23T18:02:00Z

This is the fourth post in the Accelerating IO series. It addresses storage issues and shares recent results and directions with our partners. We cover the new...]]>

This is the fourth post in the Accelerating IO series. It addresses storage issues and shares recent results and directions with our partners. We cover the new...

MIO-featured1

This is the fourth post in the Accelerating IO series. It addresses storage issues and shares recent results and directions with our partners. We cover the new GPUDirect Storage release, benefits, and implementation. Accelerated computing needs accelerated IO. Otherwise, computing resources get starved for data. Given that the fraction of all workflows for which data fits in memory is��

]]> 1 Christian Hundt <![CDATA[Machine Learning Frameworks Interoperability, Part 2: Data Loading and Data Transfer Bottlenecks]]> http://www.open-lab.net/blog/?p=35948 2022-08-21T23:52:27Z 2021-08-17T16:30:00Z

Efficient pipeline design is crucial for data scientists. When composing complex end-to-end workflows, you may choose from a wide variety of building blocks,...]]>

Efficient pipeline design is crucial for data scientists. When composing complex end-to-end workflows, you may choose from a wide variety of building blocks,...

ML_Chapter-2-1

Efficient pipeline design is crucial for data scientists. When composing complex end-to-end workflows, you may choose from a wide variety of building blocks, each of them specialized for a dedicated task. Unfortunately, repeatedly converting between data formats is an error-prone and performance-degrading endeavor. Let��s change that! In this post series, we discuss different aspects of��

]]> 0 CJ Newburn <![CDATA[Accelerating IO in the Modern Data Center: Network IO]]> http://www.open-lab.net/blog/?p=21733 2022-08-21T23:40:44Z 2020-10-20T19:13:11Z

This is the second post in the Accelerating IO series, which describes the architecture, components, and benefits of Magnum IO, the IO subsystem of the modern...]]>

This is the second post in the Accelerating IO series, which describes the architecture, components, and benefits of Magnum IO, the IO subsystem of the modern...

gdr-direct-connection-for-gpus

This is the second post in the Accelerating IO series, which describes the architecture, components, and benefits of Magnum IO, the IO subsystem of the modern data center. The first post in this series introduced the Magnum IO architecture and positioned it in the broader context of CUDA, CUDA-X, and vertical application domains. Of the four major components of the architecture��

]]> 1 CJ Newburn <![CDATA[Accelerating IO in the Modern Data Center: Magnum IO Architecture]]> http://www.open-lab.net/blog/?p=21121 2023-03-22T01:09:09Z 2020-10-05T13:00:00Z

This is the first post in the Accelerating IO series, which describes the architecture, components, storage, and benefits of Magnum IO, the IO subsystem of the...]]>

This is the first post in the Accelerating IO series, which describes the architecture, components, storage, and benefits of Magnum IO, the IO subsystem of the...

magnum-io-stack-feature

This is the first post in the Accelerating IO series, which describes the architecture, components, storage, and benefits of Magnum IO, the IO subsystem of the modern data center. Previously the boundary of the unit of computing, sheet metal no longer constrains the resources that can be applied to a single problem or the data set that can be housed. The new unit is the data center.

]]> 3 Ashok Kelur <![CDATA[GPUDirect RDMA on NVIDIA Jetson AGX Xavier]]> http://www.open-lab.net/blog/?p=14675 2023-02-13T17:45:44Z 2019-06-11T13:00:25Z

Remote Direct Memory Access (RDMA) allows computers to exchange data in memory without the involvement of a CPU. The benefits include low latency and high...]]>

Remote Direct Memory Access (RDMA) allows computers to exchange data in memory without the involvement of a CPU. The benefits include low latency and high...

NVIDIA_Jetson-Xavier_RobotKV_PRESS_1600-HR

Remote Direct Memory Access (RDMA) allows computers to exchange data in memory without the involvement of a CPU. The benefits include low latency and high bandwidth data exchange. GPUDirect RDMA extends the same philosophy to the GPU and the connected peripherals in Jetson AGX Xavier. GPUDirect RDMA enables a direct path for data exchange between the GPU-accessible memory (the CUDA memory) and a��

]]> 1 Davide Rossetti <![CDATA[Benchmarking GPUDirect RDMA on Modern Server Platforms]]> http://www.open-lab.net/blog/parallelforall/?p=3451 2023-07-05T19:44:19Z 2014-10-08T02:27:45Z

NVIDIA GPUDirect RDMA is a technology which enables a direct path for data exchange between the GPU and third-party peer devices using standard features of PCI...]]>

NVIDIA GPUDirect RDMA is a technology which enables a direct path for data exchange between the GPU and third-party peer devices using standard features of PCI...

NVIDIA GPUDirect RDMA is a technology which enables a direct path for data exchange between the GPU and third-party peer devices using standard features of PCI Express. Examples of third-party devices include network interfaces, video acquisition devices, storage adapters, and medical equipment. Enabled on Tesla and Quadro-class GPUs, GPUDirect RDMA relies on the ability of NVIDIA GPUs to expose��

]]> 40 ��˳��97caoporen��