C++ – NVIDIA Technical Blog News and tutorials for developers, data scientists, and IT admins 2025-05-16T23:50:38Z http://www.open-lab.net/blog/feed/ Michelle Horton <![CDATA[Optimizing Drug Discovery with CUDA Graphs, Coroutines, and GPU Workflows]]> http://www.open-lab.net/blog/?p=90780 2024-10-31T16:21:20Z 2024-10-23T17:28:49Z Pharmaceutical research demands fast, efficient simulations to predict how molecules interact, speeding up drug discovery. Jiqun Tu, a senior developer...]]> Pharmaceutical research demands fast, efficient simulations to predict how molecules interact, speeding up drug discovery. Jiqun Tu, a senior developer...Illustration representing drug discovery.

Pharmaceutical research demands fast, efficient simulations to predict how molecules interact, speeding up drug discovery. Jiqun Tu, a senior developer technology engineer at NVIDIA, and Ellery Russell, tech lead for the Desmond engine at Schr?dinger, explore advanced GPU optimization techniques designed to accelerate molecular dynamics simulations. In this NVIDIA GTC 2024 session��

Source

]]>
0
Ioana Boier <![CDATA[How to Accelerate Quantitative Finance with ISO C++ Standard Parallelism]]> http://www.open-lab.net/blog/?p=78691 2024-04-09T23:45:35Z 2024-03-06T19:00:00Z Quantitative finance libraries are software packages that consist of mathematical, statistical, and, more recently, machine learning models designed for use in...]]> Quantitative finance libraries are software packages that consist of mathematical, statistical, and, more recently, machine learning models designed for use in...

Quantitative finance libraries are software packages that consist of mathematical, statistical, and, more recently, machine learning models designed for use in quantitative investment contexts. They contain a wide range of functionalities, often proprietary, to support the valuation, risk management, construction, and optimization of investment portfolios. Financial firms that develop such��

Source

]]>
1
Damien Fagnou <![CDATA[Accelerate 3D Workflows with Modular, OpenUSD-Powered Omniverse Release]]> http://www.open-lab.net/blog/?p=68852 2024-03-13T17:42:54Z 2023-08-08T18:30:00Z The latest release of NVIDIA Omniverse delivers an exciting collection of new features based on Omniverse Kit 105, making it easier than ever for developers to...]]> The latest release of NVIDIA Omniverse delivers an exciting collection of new features based on Omniverse Kit 105, making it easier than ever for developers to...

The latest release of NVIDIA Omniverse delivers an exciting collection of new features based on Omniverse Kit 105, making it easier than ever for developers to get started building 3D simulation tools and workflows. Built on Universal Scene Description, known as OpenUSD, and NVIDIA RTX and AI technologies, Omniverse enables you to create advanced, real-time 3D simulation applications for��

Source

]]>
0
Peter Entschev <![CDATA[Debugging a Mixed Python and C Language Stack]]> http://www.open-lab.net/blog/?p=63641 2023-06-09T22:28:19Z 2023-04-20T17:00:00Z Debugging is difficult. Debugging across multiple languages is especially challenging, and debugging across devices often requires a team with varying skill...]]> Debugging is difficult. Debugging across multiple languages is especially challenging, and debugging across devices often requires a team with varying skill...

Debugging is difficult. Debugging across multiple languages is especially challenging, and debugging across devices often requires a team with varying skill sets and expertise to reveal the underlying problem. Yet projects often require using multiple languages, to ensure high performance where necessary, a user-friendly experience, and compatibility where possible. Unfortunately��

Source

]]>
0
Daniel Juenger <![CDATA[Maximizing Performance with Massively Parallel Hash Maps on GPUs]]> http://www.open-lab.net/blog/?p=61480 2023-05-23T23:50:12Z 2023-03-06T17:30:00Z Decades of computer science history have been devoted to devising solutions for efficient storage and retrieval of information. Hash maps (or hash tables) are a...]]> Decades of computer science history have been devoted to devising solutions for efficient storage and retrieval of information. Hash maps (or hash tables) are a...

Decades of computer science history have been devoted to devising solutions for efficient storage and retrieval of information. Hash maps (or hash tables) are a popular data structure for information storage given their amortized, constant-time guarantees for the insertion and retrieval of elements. However, despite their prevalence, hash maps are seldom discussed in the context of GPU��

Source

]]>
1
Michelle Horton <![CDATA[New Course: Scaling GPU-Accelerated Applications with the C++ Standard Library]]> http://www.open-lab.net/blog/?p=61534 2023-06-09T22:37:49Z 2023-03-02T17:00:00Z Learn how to write scalable GPU-accelerated hybrid applications using C++ standard language features alongside MPI in this interactive hands-on self-paced...]]> Learn how to write scalable GPU-accelerated hybrid applications using C++ standard language features alongside MPI in this interactive hands-on self-paced...Illustration of different AI workflows in enterprise settings such as an airports.

Learn how to write scalable GPU-accelerated hybrid applications using C++ standard language features alongside MPI in this interactive hands-on self-paced course.

Source

]]>
0
Julien Jomier <![CDATA[Rapidly Build AI-Streaming Apps with Python and C++]]> http://www.open-lab.net/blog/?p=59300 2023-05-24T00:11:43Z 2023-01-09T21:00:00Z The computational needs for AI processing of sensor streams at the edge are increasingly demanding. Edge devices must keep up with high rates of incoming data...]]> The computational needs for AI processing of sensor streams at the edge are increasingly demanding. Edge devices must keep up with high rates of incoming data...

The computational needs for AI processing of sensor streams at the edge are increasingly demanding. Edge devices must keep up with high rates of incoming data streams, processing, displaying, archiving, and streaming results or closing a control loop in real time. This requires powerful, efficient, and accurate hardware and software solutions capable of high performance computing.

Source

]]>
0
Michelle Horton <![CDATA[New Course: GPU Acceleration with the C++ Standard Library]]> http://www.open-lab.net/blog/?p=58113 2023-06-12T08:26:38Z 2022-12-19T20:00:00Z Learn how to write simple, portable, parallel-first GPU-accelerated applications using only C++ standard language features in this self-paced course from the...]]> Learn how to write simple, portable, parallel-first GPU-accelerated applications using only C++ standard language features in this self-paced course from the...

Learn how to write simple, portable, parallel-first GPU-accelerated applications using only C++ standard language features in this self-paced course from the NVIDIA Deep Learning Institute��

Source

]]>
0
Aastha Jhunjhunwala <![CDATA[Accelerating GPU Applications with NVIDIA Math Libraries]]> http://www.open-lab.net/blog/?p=50947 2023-06-12T09:12:17Z 2022-07-26T17:00:00Z There are three main ways to accelerate GPU applications: compiler directives, programming languages, and preprogrammed libraries. Compiler directives such as...]]> There are three main ways to accelerate GPU applications: compiler directives, programming languages, and preprogrammed libraries. Compiler directives such as...

There are three main ways to accelerate GPU applications: compiler directives, programming languages, and preprogrammed libraries. Compiler directives such as OpenACC aIlow you to smoothly port your code to the GPU for acceleration with a directive-based programming model. While it is simple to use, it may not provide optimal performance in certain scenarios. Programming languages such as��

Source

]]>
0
Jackson Marusarz <![CDATA[Improve Guidance and Performance Visualization with the New Nsight Compute]]> http://www.open-lab.net/blog/?p=48546 2024-08-28T17:45:26Z 2022-05-31T16:00:00Z NVIDIA Nsight Compute is an interactive kernel profiler for CUDA applications. It provides detailed performance metrics and API debugging through a user...]]> NVIDIA Nsight Compute is an interactive kernel profiler for CUDA applications. It provides detailed performance metrics and API debugging through a user...CUDA-X logo graphic

NVIDIA Nsight Compute is an interactive kernel profiler for CUDA applications. It provides detailed performance metrics and API debugging through a user interface and a command-line tool. Nsight Compute 2022.2 includes features to expand the supported environments and workflows for CUDA kernel profiling and optimization. Download now. >> The following outlines the feature highlights of��

Source

]]>
0
Cliff Burdick <![CDATA[Speeding up Numerical Computing in C++ with a Python-like Syntax in NVIDIA MatX]]> http://www.open-lab.net/blog/?p=44259 2024-03-07T00:54:10Z 2022-02-24T18:56:19Z Rob Smallshire once said, "You can write faster code in C++, but write code faster in Python." Since its release more than a decade ago, CUDA has given C and...]]> Rob Smallshire once said, "You can write faster code in C++, but write code faster in Python." Since its release more than a decade ago, CUDA has given C and...

Rob Smallshire once said, ��You can write faster code in C++, but write code faster in Python.�� Since its release more than a decade ago, CUDA has given C and C++ programmers the ability to maximize the performance of their code on NVIDIA GPUs. More recently, libraries such as CuPy and PyTorch allowed developers of interpreted languages to leverage the speed of the optimized CUDA libraries��

Source

]]>
0
Chaitrali Joshi <![CDATA[NVIDIA GTC: A Complete Overview of Nsight Developer Tools]]> http://www.open-lab.net/blog/?p=40189 2024-08-28T18:17:08Z 2021-11-11T01:21:29Z The Nsight suite of Developer Tools provide insightful tracing, debugging, profiling, and other analyses to optimize your complex computational applications...]]> The Nsight suite of Developer Tools provide insightful tracing, debugging, profiling, and other analyses to optimize your complex computational applications...Nsight logo

The Nsight suite of Developer Tools provide insightful tracing, debugging, profiling, and other analyses to optimize your complex computational applications across NVIDIA GPUs, and CPUs including x86, Arm, and Power architectures. NVIDIA Nsight Systems is a performance analysis tool designed to visualize, analyze and optimize programming models, and tune to scale efficiently across any��

Source

]]>
0
Chaitrali Joshi <![CDATA[Announcing Nsight Deep Learning Designer 2021.1 �C A Tool for Efficient Deep Learning Model Design and Development]]> http://www.open-lab.net/blog/?p=36010 2024-08-28T17:47:33Z 2021-08-10T17:00:00Z Nsight Deep Learning Designer 2021.1 Today NVIDIA announced Nsight DL Designer - the first in-class integrated development environment to support efficient...]]> Nsight Deep Learning Designer 2021.1 Today NVIDIA announced Nsight DL Designer - the first in-class integrated development environment to support efficient...

Today NVIDIA announced Nsight DL Designer �C the first in-class integrated development environment to support efficient design of deep neural networks for in-app inference. Download Now! This tool aims at streamlining the often iterative process of designing deep neural network models for in-app inferencing by providing efficient support at every stage of the process. Nsight DL Designer is a��

Source

]]>
0
Ben Zaitlen https://www.linkedin.com/in/benjamin-zaitlen-62ab7b4/ <![CDATA[NVIDIA Tools Extension API: An Annotation Tool for Profiling Code in Python and C/C++]]> http://www.open-lab.net/blog/?p=24485 2023-02-13T18:03:04Z 2021-03-10T18:49:02Z As PyData leverages much of the static language world for speed including CUDA, we need tools which not only profile and measure across languages but also...]]> As PyData leverages much of the static language world for speed including CUDA, we need tools which not only profile and measure across languages but also...

As PyData leverages much of the static language world for speed including CUDA, we need tools which not only profile and measure across languages but also devices, CPU, and GPU. While there are many great profiling tools within the Python ecosystem: line-profilers like cProfile and profilers which can observe code execution in C-extensions like PySpy/Viztracer. None of the Python profilers can��

Source

]]>
1
Mark Harris <![CDATA[Fast, Flexible Allocation for NVIDIA CUDA with RAPIDS Memory Manager]]> http://www.open-lab.net/blog/?p=22554 2022-08-21T23:40:48Z 2020-12-08T19:27:00Z When I joined the RAPIDS team in 2018, NVIDIA CUDA device memory allocation was a performance problem. RAPIDS cuDF allocates and deallocates memory at high...]]> When I joined the RAPIDS team in 2018, NVIDIA CUDA device memory allocation was a performance problem. RAPIDS cuDF allocates and deallocates memory at high...Image depicting NVIDIA CEO Jen-Hsun Huang explaining the importance of the RAPIDS launch demo at GTC Europe 2018.

When I joined the RAPIDS team in 2018, NVIDIA CUDA device memory allocation was a performance problem. RAPIDS cuDF allocates and deallocates memory at high frequency, because its APIs generally create new and s rather than modifying them in place. The overhead of and synchronization of was holding RAPIDS back. My first task for RAPIDS was to help with this problem, so I created a rough��

Source

]]>
9
Michael Wolfe <![CDATA[Detecting Divergence Using PCAST to Compare GPU to CPU Results]]> http://www.open-lab.net/blog/?p=22165 2022-08-21T23:40:47Z 2020-11-18T16:00:00Z Parallel Compiler Assisted Software Testing (PCAST) is a feature available in the NVIDIA HPC Fortran, C++, and C compilers. PCAST has two use cases. The first...]]> Parallel Compiler Assisted Software Testing (PCAST) is a feature available in the NVIDIA HPC Fortran, C++, and C compilers. PCAST has two use cases. The first...PCAST helps to quickly isolate divergence between CPU and GPU results so you can isolate bugs or verify your results are OK even if they aren��t identical.

Parallel Compiler Assisted Software Testing (PCAST) is a feature available in the NVIDIA HPC Fortran, C++, and C compilers. PCAST has two use cases. The first is testing changes to parts of a program, new compile-time flags, or a port to a new compiler or to a new processor. You might want to test whether a new library gives the same result, or test the safety of adding OpenMP parallelism��

Source

]]>
0
David Olsen <![CDATA[Accelerating Standard C++ with GPUs Using stdpar]]> http://www.open-lab.net/blog/?p=18511 2023-12-05T23:58:18Z 2020-08-04T23:30:00Z Historically, accelerating your C++ code with GPUs has not been possible in Standard C++ without using language extensions or additional libraries: CUDA C++...]]> Historically, accelerating your C++ code with GPUs has not been possible in Standard C++ without using language extensions or additional libraries: CUDA C++...Standard Parallellism in C++

Historically, accelerating your C++ code with GPUs has not been possible in Standard C++ without using language extensions or additional libraries: In many cases, the results of these ports are worth the effort. But what if you could get the same effect without that cost? What if you could take your Standard C++ code and accelerate on a GPU? Now you can!

Source

]]>
7
Piotr Wojciechowski <![CDATA[How to Speed Up Deep Learning Inference Using TensorRT]]> http://www.open-lab.net/blog/?p=12738 2022-10-10T18:51:43Z 2018-11-08T15:00:52Z [stextbox id="info"]Looking for more? Check out the hands-on DLI training course: Optimization and Deployment of TensorFlow Models with TensorRT[/stextbox]...]]> [stextbox id="info"]Looking for more? Check out the hands-on DLI training course: Optimization and Deployment of TensorFlow Models with TensorRT[/stextbox]...

Source

]]>
18
Roger Allen <![CDATA[Accelerated Ray Tracing in One Weekend in CUDA]]> http://www.open-lab.net/blog/?p=12666 2022-08-21T23:39:12Z 2018-11-05T14:00:37Z Recent announcements of NVIDIA��s new Turing GPUs, RTX technology, and Microsoft��s DirectX Ray Tracing have spurred a renewed interest in ray tracing. Using...]]> Recent announcements of NVIDIA��s new Turing GPUs, RTX technology, and Microsoft��s DirectX Ray Tracing have spurred a renewed interest in ray tracing. Using...

Source

]]>
25
Sami Kama <![CDATA[TensorRT Integration Speeds Up TensorFlow Inference]]> http://www.open-lab.net/blog/?p=9984 2022-08-21T23:38:49Z 2018-03-27T17:33:00Z Update, May 9, 2018: TensorFlow v1.7 and above integrates with TensorRT 3.0.4. NVIDIA is working on supporting the integration for a wider set of configurations...]]> Update, May 9, 2018: TensorFlow v1.7 and above integrates with TensorRT 3.0.4. NVIDIA is working on supporting the integration for a wider set of configurations...

Update, May 9, 2018: TensorFlow v1.7 and above integrates with TensorRT 3.0.4. NVIDIA is working on supporting the integration for a wider set of configurations and versions. We��ll publish updates when these become available. Meanwhile, if you��re using , simply download TensorRT files for Ubuntu 14.04 not16.04, no matter what version of Ubuntu you��re running.

Source

]]>
40
Florent Duguet <![CDATA[Hybridizer: High-Performance C# on GPUs]]> http://www.open-lab.net/blog/parallelforall/?p=8788 2022-08-21T23:38:36Z 2017-12-14T00:33:07Z [caption id="attachment_8784" align="alignright" width="400"] Figure 1. The Hybridizer Pipeline.[/caption] Hybridizer?is a compiler from Altimesh?that lets...]]> [caption id="attachment_8784" align="alignright" width="400"] Figure 1. The Hybridizer Pipeline.[/caption] Hybridizer?is a compiler from Altimesh?that lets...Altimesh Logo

Hybridizer is a compiler from Altimesh that lets you program GPUs and other accelerators from C# code or .NET Assembly. Using decorated symbols to express parallelism, Hybridizer generates source code or binaries optimized for multicore CPUs and GPUs. In this blog post we illustrate the CUDA target. Figure 1 shows the Hybridizer compilation pipeline. Using parallelization patterns such as��

Source

]]>
4
Andrew Kerr <![CDATA[CUTLASS: Fast Linear Algebra in CUDA C++]]> http://www.open-lab.net/blog/parallelforall/?p=8708 2023-02-13T17:46:48Z 2017-12-06T04:03:29Z Update May 21, 2018: CUTLASS 1.0 is now available as Open Source software at the CUTLASS repository. CUTLASS 1.0 has changed substantially from our preview...]]> Update May 21, 2018: CUTLASS 1.0 is now available as Open Source software at the CUTLASS repository. CUTLASS 1.0 has changed substantially from our preview...

Update May 21, 2018: CUTLASS 1.0 is now available as Open Source software at the CUTLASS repository. CUTLASS 1.0 has changed substantially from our preview release described in the blog post below. We have decomposed the structure of the GEMM computation into deeper, structured primitives for loading data, computing predicate masks, streaming data at each level of the GEMM hierarchy��

Source

]]>
13
Jaydeep Marathe <![CDATA[New Compiler Features in CUDA 8]]> http://www.open-lab.net/blog/parallelforall/?p=7346 2022-08-21T23:38:01Z 2016-11-08T07:14:00Z CUDA 8 is one of the most significant updates in the history of the CUDA platform. In addition to Unified Memory and the many new API and library features in...]]> CUDA 8 is one of the most significant updates in the history of the CUDA platform. In addition to Unified Memory and the many new API and library features in...

Source

]]>
3
Brad Nemire <![CDATA[Pedestrian-Following Service Robots Made Possible with CUDA Acceleration]]> http://news.www.open-lab.net/?p=6941 2022-08-21T23:41:49Z 2016-01-13T17:53:13Z A team of researchers from Seoul National University built a pedestrian-following service robot to drive smart shopping carts and other autonomous helpers. The...]]> A team of researchers from Seoul National University built a pedestrian-following service robot to drive smart shopping carts and other autonomous helpers. The...

A team of researchers from Seoul National University built a pedestrian-following service robot to drive smart shopping carts and other autonomous helpers. The performance of their initial CPU-only implementation was ��not acceptable for a real-time system�� so they now use a GPU-accelerated CUDA implementation for a 13x performance boost. Mobile robots that track a person have received��

Source

]]>
0
Brad Nemire <![CDATA[NVIDIA to Benefit from Shift to GPU-powered Deep Learning]]> http://news.www.open-lab.net/?p=6663 2023-11-03T07:15:34Z 2015-11-10T22:59:12Z Wired?discusses Google��s announcement that it is open sourcing its TensorFlow machine learning system - noting the system uses GPUs to both train and run...]]> Wired?discusses Google��s announcement that it is open sourcing its TensorFlow machine learning system - noting the system uses GPUs to both train and run...

Wired discusses Google��s announcement that it is open sourcing its TensorFlow machine learning system �C noting the system uses GPUs to both train and run artificial intelligence services at the company. Inside Google, when tackling tasks like image recognition and speech recognition and language translation, TensorFlow depends on machines equipped with GPUs that were originally designed to render��

Source

]]>
0
Brad Nemire <![CDATA[Performance Portability for GPUs and CPUs with OpenACC]]> http://news.www.open-lab.net/?p=6632 2022-08-21T23:41:33Z 2015-10-29T22:30:49Z New PGI compiler release includes support for C++ and Fortran applications to run in parallel on multi-core CPUs or GPU accelerators. OpenACC gives?scientists...]]> New PGI compiler release includes support for C++ and Fortran applications to run in parallel on multi-core CPUs or GPU accelerators. OpenACC gives?scientists...

New PGI compiler release includes support for C++ and Fortran applications to run in parallel on multi-core CPUs or GPU accelerators. OpenACC gives scientists and researchers a simple and powerful way to accelerate scientific computing applications incrementally. With the PGI Compiler 15.10 release, OpenACC enables performance portability between accelerators and multicore CPUs.

Source

]]>
0
Mark Harris <![CDATA[Simple, Portable Parallel C++ with Hemi 2 and CUDA 7.5]]> http://www.open-lab.net/blog/parallelforall/?p=5917 2022-08-21T23:37:38Z 2015-09-21T11:44:48Z The last two releases of CUDA have added support for the powerful new features of C++. In the post The Power of C++11 in CUDA 7?I discussed the importance...]]> The last two releases of CUDA have added support for the powerful new features of C++. In the post The Power of C++11 in CUDA 7?I discussed the importance...

The last two releases of CUDA have added support for the powerful new features of C++. In the post The Power of C++11 in CUDA 7 I discussed the importance of C++11 for parallel programming on GPUs, and in the post New Features in CUDA 7.5 I introduced a new experimental feature in the NVCC CUDA C++ compiler: support for GPU Lambda expressions. Lambda expressions, introduced in C++11��

Source

]]>
3
Daniel Egloff <![CDATA[.NET Cloud Computing with Alea GPU]]> http://www.open-lab.net/blog/parallelforall/?p=5677 2025-05-01T18:34:20Z 2015-08-04T04:25:42Z Cloud computing is all about making resources available on demand, and its availability, flexibility, and lower cost has helped it take commercial computing by...]]> Cloud computing is all about making resources available on demand, and its availability, flexibility, and lower cost has helped it take commercial computing by...

Cloud computing is all about making resources available on demand, and its availability, flexibility, and lower cost has helped it take commercial computing by storm. At the Microsoft Build 2015 conference in San Francisco Microsoft revealed that its AzureC cloud computing platform is averaging over 90 thousand new customers per month; contains more than 1.4 million SQL databases being used by��

Source

]]>
0
Christopher Sewell <![CDATA[GPU-Accelerated Cosmological Analysis on the Titan Supercomputer]]> http://www.open-lab.net/blog/parallelforall/?p=5607 2022-08-21T23:37:34Z 2015-07-21T05:24:02Z Ever looked up in the sky and wondered where it all came from? Cosmologists are in the same boat, trying to understand how the Universe arrived at the structure...]]> Ever looked up in the sky and wondered where it all came from? Cosmologists are in the same boat, trying to understand how the Universe arrived at the structure...

Ever looked up in the sky and wondered where it all came from? Cosmologists are in the same boat, trying to understand how the Universe arrived at the structure we observe today. They use supercomputers to follow the fate of very small initial fluctuations in an otherwise uniform density. As time passes, gravity causes the small fluctuations to grow, eventually forming the complex structures that��

Source

]]>
0
Paresh Kharya <![CDATA[Introducing the NVIDIA OpenACC Toolkit]]> http://www.open-lab.net/blog/parallelforall/?p=5569 2022-11-28T18:20:54Z 2015-07-13T07:01:55Z Programmability is crucial to accelerated computing, and NVIDIA's CUDA Toolkit has been critical to the success of GPU computing. Over three million CUDA...]]> Programmability is crucial to accelerated computing, and NVIDIA's CUDA Toolkit has been critical to the success of GPU computing. Over three million CUDA...

Programmability is crucial to accelerated computing, and NVIDIA��s CUDA Toolkit has been critical to the success of GPU computing. Over three million CUDA Toolkits have been downloaded since its first launch. However, there are many scientists and researchers yet to benefit from GPU computing. These scientists have limited time to learn and apply a parallel programming language, and they often have��

Source

]]>
2
Mark Harris <![CDATA[New Features in CUDA 7.5]]> http://www.open-lab.net/blog/parallelforall/?p=5529 2023-02-13T18:15:18Z 2015-07-08T07:01:34Z Today I'm happy to announce that the CUDA Toolkit 7.5 Release Candidate is now available. The CUDA Toolkit 7.5 adds support for FP16 storage for up to 2x larger...]]> Today I'm happy to announce that the CUDA Toolkit 7.5 Release Candidate is now available. The CUDA Toolkit 7.5 adds support for FP16 storage for up to 2x larger...

Today I��m happy to announce that the CUDA Toolkit 7.5 Release Candidate is now available. The CUDA Toolkit 7.5 adds support for FP16 storage for up to 2x larger data sets and reduced memory bandwidth, cuSPARSE GEMVI routines, instruction-level profiling and more. Read on for full details. CUDA 7.5 expands support for 16-bit floating point (FP16) data storage and arithmetic��

Source

]]>
66
Daniel Egloff <![CDATA[Accelerate .NET Applications with Alea GPU]]> http://www.open-lab.net/blog/parallelforall/?p=5220 2022-10-10T18:45:02Z 2015-05-21T06:36:00Z Today software companies use frameworks such as .NET to target multiple platforms from desktops to mobile phones with a single code base to reduce costs by...]]> Today software companies use frameworks such as .NET to target multiple platforms from desktops to mobile phones with a single code base to reduce costs by...

Today software companies use frameworks such as .NET to target multiple platforms from desktops to mobile phones with a single code base to reduce costs by leveraging existing libraries and to cope with changing trends. While developers can easily write scalable parallel code for multi-core CPUs on .NET with libraries such as the task parallel library, they face a bigger challenge using GPUs to��

Source

]]>
1
Mark Harris <![CDATA[C++11 in CUDA: Variadic Templates]]> http://www.open-lab.net/blog/parallelforall/?p=5011 2022-08-21T23:37:31Z 2015-03-27T05:16:56Z CUDA 7 adds C++11 feature support to nvcc, the CUDA C++ compiler. This means that you can use C++11 features not only in your host code compiled with nvcc, but...]]> CUDA 7 adds C++11 feature support to nvcc, the CUDA C++ compiler. This means that you can use C++11 features not only in your host code compiled with nvcc, but...

Source

]]>
6
Mark Harris <![CDATA[The Power of C++11 in CUDA 7]]> http://www.open-lab.net/blog/parallelforall/?p=4999 2022-08-21T23:37:31Z 2015-03-18T08:48:26Z Today I'm excited to announce the official release of CUDA 7, the latest release of the popular CUDA Toolkit. Download the CUDA Toolkit version 7 now from CUDA...]]> Today I'm excited to announce the official release of CUDA 7, the latest release of the popular CUDA Toolkit. Download the CUDA Toolkit version 7 now from CUDA...

Today I��m excited to announce the official release of CUDA 7, the latest release of the popular CUDA Toolkit. Download the CUDA Toolkit version 7 now from CUDA Zone! CUDA 7 has a huge number of improvements and new features, including C++11 support, the new cuSOLVER library, and support for Runtime Compilation. In a previous post I told you about the features of CUDA 7, so I won��t repeat myself��

Source

]]>
7
Pavan Yalamanchilli <![CDATA[ArrayFire: A Portable Open-Source Accelerated Computing Library]]> http://www.open-lab.net/blog/parallelforall/?p=4135 2022-10-10T18:43:42Z 2014-12-09T05:20:20Z The ArrayFire library is a high-performance software library with a focus on portability and productivity. It supports highly tuned, GPU-accelerated algorithms...]]> The ArrayFire library is a high-performance software library with a focus on portability and productivity. It supports highly tuned, GPU-accelerated algorithms...ArrayFire Logo

The ArrayFire library is a high-performance software library with a focus on portability and productivity. It supports highly tuned, GPU-accelerated algorithms using an easy-to-use API. ArrayFire wraps GPU memory into a simple ��array�� object, enabling developers to process vectors, matrices, and volumes on the GPU using high-level routines, without having to get involved with device kernel code.

Source

]]>
1
Jeremy Appleyard <![CDATA[CUDA Pro Tip: Optimize for Pointer Aliasing]]> http://www.open-lab.net/blog/parallelforall/?p=3431 2022-08-21T23:37:07Z 2014-08-08T01:29:25Z Often cited as the main reason that na?ve C/C++ code cannot match FORTRAN performance, pointer aliasing is an important topic to understand when considering...]]> Often cited as the main reason that na?ve C/C++ code cannot match FORTRAN performance, pointer aliasing is an important topic to understand when considering...GPU Pro Tip

Often cited as the main reason that na?ve C/C++ code cannot match FORTRAN performance, pointer aliasing is an important topic to understand when considering optimizations for your C/C++ code. In this tip I will describe what pointer aliasing is and a simple way to alter your code so that it does not harm your application performance. Two pointers alias if the memory to which they point��

Source

]]>
13
Mark Harris <![CDATA[CUDA Pro Tip: Occupancy API Simplifies Launch Configuration]]> http://www.open-lab.net/blog/parallelforall/?p=3366 2022-08-21T23:37:06Z 2014-07-18T04:43:39Z CUDA programmers often need to decide on a block size to use for a kernel launch. For key kernels, its important to understand the constraints of the kernel and...]]> CUDA programmers often need to decide on a block size to use for a kernel launch. For key kernels, its important to understand the constraints of the kernel and...GPU Pro Tip

CUDA programmers often need to decide on a block size to use for a kernel launch. For key kernels, its important to understand the constraints of the kernel and the GPU it is running on to choose a block size that will result in good performance. One common heuristic used to choose a good block size is to aim for high occupancy, which is the ratio of the number of active warps per multiprocessor��

Source

]]>
12
Tony Scudiero <![CDATA[Separate Compilation and Linking of CUDA C++ Device Code]]> http://www.open-lab.net/blog/parallelforall/?p=2522 2022-08-21T23:37:02Z 2014-04-22T15:03:27Z Managing complexity in?large programs requires breaking them down into components that are responsible for small, well-defined portions of the overall program....]]> Managing complexity in?large programs requires breaking them down into components that are responsible for small, well-defined portions of the overall program....

Source

]]>
39
Mark Harris <![CDATA[Unified Memory in CUDA 6]]> http://www.open-lab.net/blog/parallelforall/?p=2221 2022-08-21T23:36:58Z 2013-11-18T15:59:27Z With CUDA 6, NVIDIA introduced one of the most dramatic programming model improvements in the history of the CUDA platform, Unified Memory. In a typical PC or...]]> With CUDA 6, NVIDIA introduced one of the most dramatic programming model improvements in the history of the CUDA platform, Unified Memory. In a typical PC or...Unified Memory

With CUDA 6, NVIDIA introduced one of the most dramatic programming model improvements in the history of the CUDA platform, Unified Memory. In a typical PC or cluster node today, the memories of the CPU and GPU are physically distinct and separated by the PCI-Express bus. Before CUDA 6, that is exactly how the programmer has to view things. Data that is shared between the CPU and GPU must be��

Source

]]>
87
Mark Harris <![CDATA[How to Overlap Data Transfers in CUDA C/C++]]> http://www.parallelforall.com/?p=883 2022-08-21T23:36:49Z 2012-12-14T02:24:51Z In our last CUDA C/C++ post we discussed how to transfer data efficiently between the host and device.  In this post, we discuss how to overlap data...]]> In our last CUDA C/C++ post we discussed how to transfer data efficiently between the host and device.  In this post, we discuss how to overlap data...

In our last CUDA C/C++ post we discussed how to transfer data efficiently between the host and device. In this post, we discuss how to overlap data transfers with computation on the host, computation on the device, and in some cases other data transfers between the host and device. Achieving overlap between data transfers and other operations requires the use of CUDA streams, so first let��s learn��

Source

]]>
23
Mark Harris <![CDATA[How to Implement Performance Metrics in CUDA C/C++]]> http://test.markmark.net/?p=390 2023-05-22T22:52:22Z 2012-11-08T04:03:28Z In the first post of this series we looked at the basic elements of CUDA C/C++ by examining a CUDA C/C++ implementation of SAXPY. In this second post we discuss...]]> In the first post of this series we looked at the basic elements of CUDA C/C++ by examining a CUDA C/C++ implementation of SAXPY. In this second post we discuss...

In the first post of this series we looked at the basic elements of CUDA C/C++ by examining a CUDA C/C++ implementation of SAXPY. In this second post we discuss how to analyze the performance of this and other CUDA C/C++ codes. We will rely on these performance measurement techniques in future posts where performance optimization will be increasingly important. CUDA performance measurement is��

Source

]]>
20
Mark Harris <![CDATA[An Easy Introduction to CUDA C and C++]]> http://test.markmark.net/?p=316 2023-05-22T22:49:47Z 2012-10-31T08:20:21Z Update (January 2017): Check out a new, even easier introduction to CUDA! This post is the first in a series on CUDA C and C++, which is the C/C++ interface to...]]> Update (January 2017): Check out a new, even easier introduction to CUDA! This post is the first in a series on CUDA C and C++, which is the C/C++ interface to...

This post is the first in a series on CUDA C and C++, which is the C/C++ interface to the CUDA parallel computing platform. This series of posts assumes familiarity with programming in C. We will be running a parallel series of posts about CUDA Fortran targeted at Fortran programmers . These two series will cover the basic concepts of parallel computing on the CUDA platform. From here on unless I��

Source

]]>
48
Mark Harris <![CDATA[Six Ways to SAXPY]]> http://www.parallelforall.com/?p=40 2023-02-13T18:13:03Z 2012-07-02T11:03:25Z For even more ways to SAXPY using the latest NVIDIA HPC SDK with standard language parallelism, see N Ways to SAXPY: Demonstrating the Breadth of GPU...]]> For even more ways to SAXPY using the latest NVIDIA HPC SDK with standard language parallelism, see N Ways to SAXPY: Demonstrating the Breadth of GPU...

Source

]]>
17
Mark Harris <![CDATA[Expressive Algorithmic Programming with Thrust]]> http://www.parallelforall.com/?p=29 2022-10-10T18:41:21Z 2012-06-06T00:06:11Z Thrust is a parallel algorithms library which resembles the C++ Standard Template Library (STL). Thrust's High-Level interface greatly enhances...]]> Thrust is a parallel algorithms library which resembles the C++ Standard Template Library (STL). Thrust's High-Level interface greatly enhances...

Source

]]>
2
Mark Harris <![CDATA[An OpenACC Example (Part 2)]]> http://www.parallelforall.com/?p=21 2023-05-18T22:12:51Z 2012-03-26T06:39:14Z You may want to read?the more?recent post?Getting Started with OpenACC?by Jeff Larkin. In?my previous post?I added 3 lines of OpenACC directives to a...]]> You may want to read?the more?recent post?Getting Started with OpenACC?by Jeff Larkin. In?my previous post?I added 3 lines of OpenACC directives to a...

You may want to read the more recent post Getting Started with OpenACC by Jeff Larkin. In my previous post I added 3 lines of OpenACC directives to a Jacobi iteration code, achieving more than 2x speedup by running it on a GPU. In this post I��ll continue where I left off and demonstrate how we can use OpenACC directives clauses to take more explicit control over how the compiler parallelizes our��

Source

]]>
2
Mark Harris <![CDATA[An OpenACC Example (Part 1)]]> http://www.parallelforall.com/?p=19 2023-05-18T22:12:40Z 2012-03-20T06:37:33Z You may want to read the more recent post Getting Started with OpenACC?by Jeff Larkin. In this post I'll continue where I left off in my?introductory...]]> You may want to read the more recent post Getting Started with OpenACC?by Jeff Larkin. In this post I'll continue where I left off in my?introductory...

You may want to read the more recent post Getting Started with OpenACC by Jeff Larkin. In this post I��ll continue where I left off in my introductory post about OpenACC and provide a somewhat more realistic example. This simple C/Fortran code example demonstrates a 2x speedup with the addition of just a few lines of OpenACC directives, and in the next post I��ll add just a few more lines to push��

Source

]]>
0
Mark Harris <![CDATA[OpenACC: Directives for GPUs]]> http://www.parallelforall.com/?p=12 2022-08-21T23:36:44Z 2012-03-13T05:56:45Z NVIDIA has made a lot of progress with CUDA over the past five years; we estimate that there are over 150,000 CUDA developers, and important science is being accomplished with the help of CUDA. But we have a long way to go to help everyone benefit from GPU computing. There are many programmers who can��t afford the time to learn and apply a parallel programming language. Others��

Source

]]>
0
���˳���97caoporen����