Unified Memory – NVIDIA Technical Blog

Unified Memory – NVIDIA Technical Blog News and tutorials for developers, data scientists, and IT admins 2025-05-20T23:25:05Z http://www.open-lab.net/blog/feed/ Leigh Engel <![CDATA[Simplify System Memory Management with the Latest NVIDIA GH200 NVL2 Enterprise RA]]> http://www.open-lab.net/blog/?p=96079 2025-04-23T02:45:13Z 2025-02-13T21:26:30Z

NVIDIA Enterprise Reference Architectures (Enterprise RAs) can reduce the time and cost of deploying AI infrastructure solutions. They provide a streamlined...]]>

NVIDIA Enterprise Reference Architectures (Enterprise RAs) can reduce the time and cost of deploying AI infrastructure solutions. They provide a streamlined...

nvidia-gh200-nvl2

NVIDIA Enterprise Reference Architectures (Enterprise RAs) can reduce the time and cost of deploying AI infrastructure solutions. They provide a streamlined approach for building flexible and cost-effective accelerated infrastructure while ensuring compatibility and interoperability. The latest Enterprise RA details an optimized cluster configuration for systems integrated with NVIDIA GH200��

]]> 2 Chirayu Garg <![CDATA[Improving GPU Memory Oversubscription Performance]]> http://www.open-lab.net/blog/?p=37205 2022-08-21T23:52:39Z 2021-10-05T23:29:05Z

Since its introduction more than 7 years ago, the CUDA Unified Memory programming model has kept gaining popularity among developers. Unified Memory provides a...]]>

Since its introduction more than 7 years ago, the CUDA Unified Memory programming model has kept gaining popularity among developers. Unified Memory provides a...

GPU-memory-oversubscription

Since its introduction more than 7 years ago, the CUDA Unified Memory programming model has kept gaining popularity among developers. Unified Memory provides a simple interface for prototyping GPU applications without manually migrating memory between host and device. Starting from the NVIDIA Pascal GPU architecture, Unified Memory enabled applications to use all available CPU and GPU memory��

]]> 4 Corey Nolet <![CDATA[Analyzing the RNA-Sequence of 1.3M Mouse Brain Cells with RAPIDS on NVIDIA GPUs]]> http://www.open-lab.net/blog/?p=37158 2022-08-21T23:52:37Z 2021-09-08T19:40:00Z

Single-cell genomics research continues to advance drug discovery for disease prevention. For example, it has been pivotal in developing treatments for the...]]>

Single-cell genomics research continues to advance drug discovery for disease prevention. For example, it has been pivotal in developing treatments for the...

DNA genome research. DNA molecule structure. 3D double helix illustration. Genetic engineering of the future

Single-cell genomics research continues to advance drug discovery for disease prevention. For example, it has been pivotal in developing treatments for the current COVID-19 pandemic, identifying cells susceptible to infection, and revealing changes in the immune systems of infected patients. However, with the growing availability of large-scale single-cell datasets, it��s clear that computing��

]]> 0 Nikolay Sakharnykh <![CDATA[Maximizing Unified Memory Performance in CUDA]]> http://www.open-lab.net/blog/parallelforall/?p=8603 2022-08-21T23:38:33Z 2017-11-20T03:37:53Z

Many of today's applications process large volumes of data. While GPU architectures have very fast HBM or GDDR memory, they have limited capacity. Making the...]]>

Many of today's applications process large volumes of data. While GPU architectures have very fast HBM or GDDR memory, they have limited capacity. Making the... Unified Memory

Unified Memory

Many of today��s applications process large volumes of data. While GPU architectures have very fast HBM or GDDR memory, they have limited capacity. Making the most of GPU performance requires the data to be as close to the GPU as possible. This is especially important for applications that iterate over the same data multiple times or have a high flops/byte ratio. Many real-world codes have to��

]]> 18 Mark Harris <![CDATA[Unified Memory for CUDA Beginners]]> http://www.open-lab.net/blog/parallelforall/?p=7937 2022-08-21T23:38:11Z 2017-06-20T03:59:57Z

My previous introductory post, "An Even Easier Introduction to CUDA C++", introduced the basics of CUDA programming by showing how to write a simple program...]]>

My previous introductory post, "An Even Easier Introduction to CUDA C++", introduced the basics of CUDA programming by showing how to write a simple program...

CUDA_Cube_1K

]]> 46 Nikolay Sakharnykh <![CDATA[Beyond GPU Memory Limits with Unified Memory on Pascal]]> http://www.open-lab.net/blog/parallelforall/?p=7233 2022-08-21T23:37:59Z 2016-12-14T10:31:50Z

[caption id="attachment_7428" align="alignright" width="300"] Figure 1: Dimethyl ether jet simulations designed to study complex new fuels. Image courtesy of...]]>

[caption id="attachment_7428" align="alignright" width="300"] Figure 1: Dimethyl ether jet simulations designed to study complex new fuels. Image courtesy of...

Figure 1: Dimethyl ether jet simulations designed to study complex new fuels. Image courtesy of the Center for Exascale Simulation of Combustion in Turbulence (ExaCT).

Modern computer architectures have a hierarchy of memories of varying size and performance. GPU architectures are approaching a terabyte per second memory bandwidth that, coupled with high-throughput computational cores, creates an ideal device for data-intensive tasks. However, everybody knows that fast memory is expensive. Modern applications striving to solve larger and larger problems can be��

]]> 15 Mark Harris <![CDATA[Inside Pascal: NVIDIA��s Newest Computing Platform]]> http://www.open-lab.net/blog/parallelforall/?p=6535 2022-08-21T23:37:50Z 2016-04-05T17:00:44Z

At the 2016 GPU Technology Conference in San Jose, NVIDIA CEO Jen-Hsun Huang announced the new NVIDIA Tesla P100, the most advanced accelerator ever built....]]>

At the 2016 GPU Technology Conference in San Jose, NVIDIA CEO Jen-Hsun Huang announced the new NVIDIA Tesla P100, the most advanced accelerator ever built....

pascal_key_image

At the 2016 GPU Technology Conference in San Jose, NVIDIA CEO Jen-Hsun Huang announced the new NVIDIA Tesla P100, the most advanced accelerator ever built. Based on the new NVIDIA Pascal GP100 GPU and powered by ground-breaking technologies, Tesla P100 delivers the highest absolute performance for HPC, technical computing, deep learning, and many computationally intensive datacenter workloads.

]]> 51 Mark Harris <![CDATA[CUDA 8 Features Revealed]]> http://www.open-lab.net/blog/parallelforall/?p=6554 2022-08-21T23:37:50Z 2016-04-05T12:00:11Z

Today I'm excited to announce the general availability of CUDA 8, the latest update to NVIDIA's powerful parallel computing?platform and programming model. In...]]>

Today I'm excited to announce the general availability of CUDA 8, the latest update to NVIDIA's powerful parallel computing?platform and programming model. In...

CUDA_Cube_1K

Today I��m excited to announce the general availability of CUDA 8, the latest update to NVIDIA��s powerful parallel computing platform and programming model. In this post I��ll give a quick overview of the major new features of CUDA 8. To learn more you can watch the recording of my talk from GTC 2016, ��CUDA 8 and Beyond��. A crucial goal for CUDA 8 is to provide support for the powerful new��

]]> 51 Nikolay Sakharnykh <![CDATA[High-Performance Geometric Multi-Grid with GPU Acceleration]]> http://www.open-lab.net/blog/parallelforall/?p=6313 2023-02-10T22:34:08Z 2016-02-23T10:11:05Z

Linear solvers are probably the most common tool in scientific computing applications. There are two basic classes of methods that can be used to solve an...]]>

Linear solvers are probably the most common tool in scientific computing applications. There are two basic classes of methods that can be used to solve an...

hpgmg_featured3

Linear solvers are probably the most common tool in scientific computing applications. There are two basic classes of methods that can be used to solve an equation: direct and iterative. Direct methods are usually robust, but have additional computational complexity and memory capacity requirements. Unlike direct solvers, iterative solvers require minimal memory overhead and feature better��

]]> 5 Nikolay Sakharnykh <![CDATA[Combine OpenACC and Unified Memory for Productivity and Performance]]> http://www.open-lab.net/blog/parallelforall/?p=5830 2022-08-21T23:37:37Z 2015-09-17T04:53:49Z

The post Getting Started with OpenACC?covered four steps to progressively accelerate your code with OpenACC. It's often necessary to use OpenACC directives to...]]>

The post Getting Started with OpenACC?covered four steps to progressively accelerate your code with OpenACC. It's often necessary to use OpenACC directives to...

The post Getting Started with OpenACC covered four steps to progressively accelerate your code with OpenACC. It��s often necessary to use OpenACC directives to express both loop parallelism and data locality in order to get good performance with accelerators. After expressing available parallelism, excessive data movement generated by the compiler can be a bottleneck, and correcting this by adding��

]]> 0 Mark Harris <![CDATA[How NVLink Will Enable Faster, Easier Multi-GPU Computing]]> http://www.open-lab.net/blog/parallelforall/?p=4058 2022-08-21T23:37:28Z 2014-11-14T15:05:15Z

Accelerated systems have become the new standard for high performance computing (HPC) as GPUs continue to raise the bar for both performance and energy...]]>

Accelerated systems have become the new standard for high performance computing (HPC) as GPUs continue to raise the bar for both performance and energy...

Accelerated systems have become the new standard for high performance computing (HPC) as GPUs continue to raise the bar for both performance and energy efficiency. In 2012, Oak Ridge National Laboratory announced what was to become the world��s fastest supercomputer, Titan, equipped with one NVIDIA? GPU per CPU �C over 18 thousand GPU accelerators. Titan established records not only in absolute��

]]> 10 Mark Harris <![CDATA[Unified Memory: Now for CUDA Fortran Programmers]]> http://www.open-lab.net/blog/parallelforall/?p=3441 2022-08-21T23:37:07Z 2014-08-13T07:08:26Z

Unified Memory is a CUDA feature that we've talked a lot about on Parallel Forall. CUDA 6 introduced Unified Memory, which dramatically simplifies GPU...]]>

Unified Memory is a CUDA feature that we've talked a lot about on Parallel Forall. CUDA 6 introduced Unified Memory, which dramatically simplifies GPU...

cuda_fortran_simple

Unified Memory is a CUDA feature that we��ve talked a lot about on Parallel Forall. CUDA 6 introduced Unified Memory, which dramatically simplifies GPU programming by giving programmers a single pointer to data which is accessible from either the GPU or the CPU. But this enhanced memory model has only been available to CUDA C/C++ programmers, until now. The new PGI Compiler release 14.7��

]]> 2 Denis Foley <![CDATA[NVLink, Pascal and Stacked Memory: Feeding the Appetite for Big Data]]> http://www.open-lab.net/blog/parallelforall/?p=3097 2022-08-21T23:37:04Z 2014-03-25T16:31:41Z

For more recent info on NVLink, check out the?post, "How NVLink Will Enable Faster, Easier Multi-GPU Computing". NVIDIA GPU accelerators have emerged in...]]>

For more recent info on NVLink, check out the?post, "How NVLink Will Enable Faster, Easier Multi-GPU Computing". NVIDIA GPU accelerators have emerged in...

stacked_memory

For more recent info on NVLink, check out the post, ��How NVLink Will Enable Faster, Easier Multi-GPU Computing��. NVIDIA GPU accelerators have emerged in High-Performance Computing as an energy-efficient way to provide significant compute capability. The Green500 supercomputer list makes this clear: the top 10 supercomputers on the list feature NVIDIA GPUs. Today at the 2014 GPU Technology��

]]> 14 Mark Harris <![CDATA[CUDA Pro Tip: Control GPU Visibility with CUDA_VISIBLE_DEVICES]]> http://www.open-lab.net/blog/parallelforall/?p=2503 2022-08-21T23:37:02Z 2014-01-28T01:47:24Z

As a CUDA developer, you will often need to control which devices your application uses. In a short-but-sweet post on the Acceleware blog, Chris Mason writes:...]]>

As a CUDA developer, you will often need to control which devices your application uses. In a short-but-sweet post on the Acceleware blog, Chris Mason writes:... GPU Pro Tip

GPU Pro Tip

As a CUDA developer, you will often need to control which devices your application uses. In a short-but-sweet post on the Acceleware blog, Chris Mason writes: As Chris points out, robust applications should use the CUDA API to enumerate and select devices with appropriate capabilities at run time. To learn how, read the section on Device Enumeration in the CUDA Programming Guide.

]]> 3 Mark Harris <![CDATA[Unified Memory in CUDA 6]]> http://www.open-lab.net/blog/parallelforall/?p=2221 2022-08-21T23:36:58Z 2013-11-18T15:59:27Z

With CUDA 6, NVIDIA introduced one of the most dramatic programming model improvements in the history of the CUDA platform, Unified Memory. In a typical PC or...]]>

With CUDA 6, NVIDIA introduced one of the most dramatic programming model improvements in the history of the CUDA platform, Unified Memory. In a typical PC or...

Unified Memory

With CUDA 6, NVIDIA introduced one of the most dramatic programming model improvements in the history of the CUDA platform, Unified Memory. In a typical PC or cluster node today, the memories of the CPU and GPU are physically distinct and separated by the PCI-Express bus. Before CUDA 6, that is exactly how the programmer has to view things. Data that is shared between the CPU and GPU must be��

]]> 87 ��˳��97caoporen��