As the demand for sophisticated AI capabilities escalates, VAST Data introduces the VAST Data Platform, now enhanced with NVIDIA BlueField DPUs. This innovation is tailored to meet the stringent demands of AI-driven data centers and optimize AI workloads and data management. This post presents how BlueField DPUs provide VAST with a significant boost in both performance and efficiency to��
]]>NVIDIA DOCA GPUNetIO is a library within the NVIDIA DOCA SDK, specifically designed for real-time inline GPU packet processing. It combines technologies like GPUDirect RDMA and GPUDirect Async to enable the creation of GPU-centric applications where a CUDA kernel can directly communicate with the network interface card (NIC) for sending and receiving packets, bypassing the CPU and excluding it��
]]>NVIDIA Zero Touch RoCE (ZTR) enables data centers to seamlessly deploy RDMA over Converged Ethernet (RoCE) without requiring any special switch configuration. Until recently, ZTR was optimal for only small to medium-sized data centers. Meanwhile, large-scale deployments have traditionally relied on Explicit Congestion Notification (ECN) to enable RoCE network transport��
]]>Hybrid cloud refers to a mix of computing and storage services of on-premises infrastructure, like Dell EMC VxRail hyperconverged infrastructure (HCI) and multipublic cloud services such as Amazon Web Services or Microsoft Azure. Hybrid cloud architecture gives you the flexibility to maintain traditional IT on-premises deployments for running business-critical applications or to protect sensitive��
]]>Efficient pipeline design is crucial for data scientists. When composing complex end-to-end workflows, you may choose from a wide variety of building blocks, each of them specialized for a dedicated task. Unfortunately, repeatedly converting between data formats is an error-prone and performance-degrading endeavor. Let��s change that! In this post series, we discuss different aspects of��
]]>This post was originally published on the Mellanox blog. Network File System (NFS) is a ubiquitous component of most modern clusters. It was initially designed as a work-group filesystem, making a central file store available to and shared among several client servers. As NFS became more popular, it was used for mission-critical applications, which required access to storage. Next��
]]>The Duchess of Windsor famously said that you can never be too rich or too thin. A similar observation is true when trying to match deep learning applications and compute resources: You can never have too much horsepower. Intractable problems in fields as diverse as finance, security, medical research, resource exploration, self-driving vehicles, and defense are being solved today by training��
]]>This is the second post in the Accelerating IO series, which describes the architecture, components, and benefits of Magnum IO, the IO subsystem of the modern data center. The first post in this series introduced the Magnum IO architecture and positioned it in the broader context of CUDA, CUDA-X, and vertical application domains. Of the four major components of the architecture��
]]>This is the first post in the Accelerating IO series, which describes the architecture, components, storage, and benefits of Magnum IO, the IO subsystem of the modern data center. Previously the boundary of the unit of computing, sheet metal no longer constrains the resources that can be applied to a single problem or the data set that can be housed. The new unit is the data center.
]]>Remote Direct Memory Access (RDMA) allows computers to exchange data in memory without the involvement of a CPU. The benefits include low latency and high bandwidth data exchange. GPUDirect RDMA extends the same philosophy to the GPU and the connected peripherals in Jetson AGX Xavier. GPUDirect RDMA enables a direct path for data exchange between the GPU-accessible memory (the CUDA memory) and a��
]]>NVIDIA GPUDirect RDMA is a technology which enables a direct path for data exchange between the GPU and third-party peer devices using standard features of PCI Express. Examples of third-party devices include network interfaces, video acquisition devices, storage adapters, and medical equipment. Enabled on Tesla and Quadro-class GPUs, GPUDirect RDMA relies on the ability of NVIDIA GPUs to expose��
]]>