MPI – NVIDIA Technical Blog

MPI – NVIDIA Technical Blog News and tutorials for developers, data scientists, and IT admins 2025-07-21T16:00:00Z http://www.open-lab.net/blog/feed/ Mohammad Almasri <![CDATA[Accelerating the HPCG Benchmark with NVIDIA Math Sparse Libraries]]> http://www.open-lab.net/blog/?p=88566 2024-09-19T19:32:22Z 2024-09-10T16:30:00Z

In the realm of high-performance computing (HPC), NVIDIA has continually advanced HPC by offering its highly optimized NVIDIA High-Performance Conjugate...]]>

In the realm of high-performance computing (HPC), NVIDIA has continually advanced HPC by offering its highly optimized NVIDIA High-Performance Conjugate... Decorative image of light fields in green, purple, and blue.

Decorative image of light fields in green, purple, and blue.

In the realm of high-performance computing (HPC), NVIDIA has continually advanced HPC by offering its highly optimized NVIDIA High-Performance Conjugate Gradient (HPCG) benchmark program as part of the NVIDIA HPC benchmark program collection. We now provide the NVIDIA HPCG benchmark program in the /NVIDIA/nvidia-hpcg GitHub repo, using its high-performance math libraries, cuSPARSE��

]]> 0 Stefan Maintz <![CDATA[Scaling VASP with NVIDIA Magnum IO]]> http://www.open-lab.net/blog/?p=57394 2023-04-17T02:20:16Z 2022-11-15T21:42:07Z

You could make an argument that the history of civilization and technological advancement is the history of the search and discovery of materials. Ages are...]]>

You could make an argument that the history of civilization and technological advancement is the history of the search and discovery of materials. Ages are...

vasp-magnum-io-featured

You could make an argument that the history of civilization and technological advancement is the history of the search and discovery of materials. Ages are named not for leaders or civilizations but for the materials that defined them: Stone Age, Bronze Age, and so on. The current digital or information age could be renamed the Silicon or Semiconductor Age and retain the same meaning.

]]> 1 Gilad Shainer <![CDATA[Accelerating Scientific Applications in HPC Clusters with NVIDIA DPUs Using the MVAPICH2-DPU MPI Library]]> http://www.open-lab.net/blog/?p=33912 2023-07-05T19:31:37Z 2021-06-28T07:01:00Z

High-performance computing (HPC) and AI have driven supercomputers into wide commercial use as the primary data processing engines enabling research, scientific...]]>

High-performance computing (HPC) and AI have driven supercomputers into wide commercial use as the primary data processing engines enabling research, scientific...

bluefield-3

High-performance computing (HPC) and AI have driven supercomputers into wide commercial use as the primary data processing engines enabling research, scientific discoveries, and product development. These systems can carry complex simulations and unlock the new era of AI, where software writes software. Supercomputing leadership means scientific and innovation leadership��

]]> 0 Logan Herche <![CDATA[Achieve up to 75% Performance Improvement for Communication Intensive HPC Applications with NVTAGS]]> http://www.open-lab.net/blog/?p=33648 2022-08-21T23:52:02Z 2021-06-23T21:29:15Z

Many GPU-accelerated HPC applications spend a substantial portion of their time in non-uniform, GPU-to-GPU communications. Additionally, in many HPC systems,...]]>

Many GPU-accelerated HPC applications spend a substantial portion of their time in non-uniform, GPU-to-GPU communications. Additionally, in many HPC systems,...

Devblog Devnews featured image 1000 x 600

Many GPU-accelerated HPC applications spend a substantial portion of their time in non-uniform, GPU-to-GPU communications. Additionally, in many HPC systems, different GPU pairs share communication links with varying bandwidth and latency. As a result, GPU assignment can substantially impact time to solution. Furthermore, on multi-node / multi-socket systems, communication performance can degrade��

]]> 0 Nathan Luehr <![CDATA[Fast Multi-GPU collectives with NCCL]]> http://www.open-lab.net/blog/parallelforall/?p=6598 2022-08-21T23:37:50Z 2016-04-07T15:27:54Z

Today many servers contain 8 or more GPUs. In principle then, scaling an application from one to many GPUs should provide a tremendous performance boost. But in...]]>

Today many servers contain 8 or more GPUs. In principle then, scaling an application from one to many GPUs should provide a tremendous performance boost. But in... Figure 5: Ring order of GPUs in PCIe tree.

Figure 5: Ring order of GPUs in PCIe tree.

Today many servers contain 8 or more GPUs. In principle then, scaling an application from one to many GPUs should provide a tremendous performance boost. But in practice, this benefit can be difficult to obtain. There are two common culprits behind poor multi-GPU scaling. The first is that enough parallelism has not been exposed to efficiently saturate the processors. The second reason for poor��

]]> 14 Jeff Larkin http://jefflarkin.com <![CDATA[GPU Pro Tip: Track MPI Calls In The NVIDIA Visual Profiler]]> http://www.open-lab.net/blog/parallelforall/?p=5177 2022-08-21T23:37:32Z 2015-05-06T02:30:13Z

Often when profiling GPU-accelerated applications that run on clusters, one needs to visualize MPI?(Message Passing Interface) calls on the GPU timeline in the...]]>

Often when profiling GPU-accelerated applications that run on clusters, one needs to visualize MPI?(Message Passing Interface) calls on the GPU timeline in the... GPU Pro Tip

GPU Pro Tip

Often when profiling GPU-accelerated applications that run on clusters, one needs to visualize MPI (Message Passing Interface) calls on the GPU timeline in the profiler. While tools like Vampir and Tau will allow programmers to see a big picture view of how a parallel application performs, sometimes all you need is a look at how MPI is affecting GPU performance on a single node using a simple tool��

]]> 2 Davide Rossetti <![CDATA[Benchmarking GPUDirect RDMA on Modern Server Platforms]]> http://www.open-lab.net/blog/parallelforall/?p=3451 2023-07-05T19:44:19Z 2014-10-08T02:27:45Z

NVIDIA GPUDirect RDMA is a technology which enables a direct path for data exchange between the GPU and third-party peer devices using standard features of PCI...]]>

NVIDIA GPUDirect RDMA is a technology which enables a direct path for data exchange between the GPU and third-party peer devices using standard features of PCI...

NVIDIA GPUDirect RDMA is a technology which enables a direct path for data exchange between the GPU and third-party peer devices using standard features of PCI Express. Examples of third-party devices include network interfaces, video acquisition devices, storage adapters, and medical equipment. Enabled on Tesla and Quadro-class GPUs, GPUDirect RDMA relies on the ability of NVIDIA GPUs to expose��

]]> 40 Jiri Kraus <![CDATA[CUDA Pro Tip: Profiling MPI Applications]]> http://www.open-lab.net/blog/parallelforall/?p=3313 2022-08-21T23:37:06Z 2014-06-19T19:05:55Z

When I profile MPI+CUDA applications, sometimes performance issues only occur for certain MPI ranks. To fix these, it's necessary to identify the MPI rank where...]]>

When I profile MPI+CUDA applications, sometimes performance issues only occur for certain MPI ranks. To fix these, it's necessary to identify the MPI rank where... GPU Pro Tip

GPU Pro Tip

When I profile MPI+CUDA applications, sometimes performance issues only occur for certain MPI ranks. To fix these, it��s necessary to identify the MPI rank where the performance issue occurs. Before CUDA 6.5 it was hard to do this because the CUDA profiler only shows the PID of the processes and leaves the developer to figure out the mapping from PIDs to MPI ranks. Although the mapping can be done��

]]> 1 Jiri Kraus <![CDATA[Benchmarking CUDA-Aware MPI]]> http://www.parallelforall.com/?p=1171 2023-07-05T19:44:41Z 2013-03-28T03:29:29Z

I introduced CUDA-aware MPI in my last post, with an introduction to MPI and a description of the functionality and benefits of CUDA-aware MPI. In this post I...]]>

I introduced CUDA-aware MPI in my last post, with an introduction to MPI and a description of the functionality and benefits of CUDA-aware MPI. In this post I...

LaunchMPI

I introduced CUDA-aware MPI in my last post, with an introduction to MPI and a description of the functionality and benefits of CUDA-aware MPI. In this post I will demonstrate the performance of MPI through both synthetic and realistic benchmarks. Since you now know why CUDA-aware MPI is more efficient from a theoretical perspective, let��s take a look at the results of MPI bandwidth and��

]]> 16 Jiri Kraus <![CDATA[An Introduction to CUDA-Aware MPI]]> http://www.parallelforall.com/?p=1362 2022-08-21T23:36:53Z 2013-03-14T02:18:53Z

MPI, the Message Passing Interface, is a standard API for communicating data via messages between distributed?processes that is?commonly used in HPC to build...]]>

MPI, the Message Passing Interface, is a standard API for communicating data via messages between distributed?processes that is?commonly used in HPC to build...

UVA

MPI, the Message Passing Interface, is a standard API for communicating data via messages between distributed processes that is commonly used in HPC to build applications that can scale to multi-node computer clusters. As such, MPI is fully compatible with CUDA, which is designed for parallel computing on a single computer or node. There are many reasons for wanting to combine the two parallel��

]]> 5 ��˳��97caoporen��