Algorithms / Numerical Techniques – NVIDIA Technical Blog

Algorithms / Numerical Techniques – NVIDIA Technical Blog News and tutorials for developers, data scientists, and IT admins 2025-05-16T23:50:38Z http://www.open-lab.net/blog/feed/ Mark J. Bennett <![CDATA[GPU-Accelerate Algorithmic Trading Simulations by over 100x with Numba]]> http://www.open-lab.net/blog/?p=96652 2025-03-10T23:13:45Z 2025-03-04T21:44:01Z

Quantitative developers need to run back-testing simulations to see how financial algorithms perform from a profit and loss (P&L) standpoint. Statistical...]]>

Quantitative developers need to run back-testing simulations to see how financial algorithms perform from a profit and loss (P&L) standpoint. Statistical... Stock board

Stock board

Quantitative developers need to run back-testing simulations to see how financial algorithms perform from a profit and loss (P&L) standpoint. Statistical techniques are important to visualize the possible outcomes of the algorithms in terms of the possible P&L paths. GPUs can greatly reduce the amount of time needed to do this. In the broader picture, mathematical modeling of financial��

]]> 0 Ken Museth <![CDATA[Building Spatial Intelligence from Real-World 3D Data Using Deep-Learning Framework fVDB]]> http://www.open-lab.net/blog/?p=86093 2025-05-07T22:35:47Z 2024-07-29T20:30:00Z

Generative physical AI models can understand and execute actions with fine or gross motor skills within the physical world. Understanding and navigating in the...]]>

Generative physical AI models can understand and execute actions with fine or gross motor skills within the physical world. Understanding and navigating in the...

street-scene-various-stages-gif

Generative physical AI models can understand and execute actions with fine or gross motor skills within the physical world. Understanding and navigating in the 3D space of the physical world requires spatial intelligence. To achieve spatial intelligence in physical AI involves converting the real world into AI-ready virtual representations that the model can understand.

]]> 0 Artem Chirkin <![CDATA[Accelerating Vector Search: NVIDIA cuVS IVF-PQ Part 2, Performance Tuning]]> http://www.open-lab.net/blog/?p=81681 2024-10-03T21:18:45Z 2024-07-18T17:10:03Z

In the first part of the series, we presented an overview of the IVF-PQ algorithm and explained how it builds on top of the IVF-Flat algorithm, using the...]]>

In the first part of the series, we presented an overview of the IVF-PQ algorithm and explained how it builds on top of the IVF-Flat algorithm, using the...

cuvs ivf-pq part 2

In the first part of the series, we presented an overview of the IVF-PQ algorithm and explained how it builds on top of the IVF-Flat algorithm, using the Product Quantization (PQ) technique to compress the index and support larger datasets. In this part two of the IVF-PQ post, we cover the practical aspects of tuning IVF-PQ performance. It��s worth noting again that IVF-PQ uses a lossy��

]]> 0 Artem Chirkin <![CDATA[Accelerating Vector Search: NVIDIA cuVS IVF-PQ Part 1, Deep Dive]]> http://www.open-lab.net/blog/?p=81652 2024-10-03T21:19:09Z 2024-07-18T17:09:45Z

In this post, we continue the series on accelerating vector search using NVIDIA cuVS. Our previous post in the series introduced IVF-Flat, a fast algorithm for...]]>

In this post, we continue the series on accelerating vector search using NVIDIA cuVS. Our previous post in the series introduced IVF-Flat, a fast algorithm for...

cuvs ivf-pq part 1

In this post, we continue the series on accelerating vector search using NVIDIA cuVS. Our previous post in the series introduced IVF-Flat, a fast algorithm for accelerating approximate nearest neighbors (ANN) search on GPUs. We discussed how using an inverted file index (IVF) provides an intuitive way to reduce the complexity of the nearest neighbor search by limiting it to only a small subset of��

]]> 0 Michelle Horton <![CDATA[Explainer: What Is K-Means?]]> http://www.open-lab.net/blog/?p=76012 2024-07-25T18:19:12Z 2024-07-05T19:00:00Z

K-means is a clustering algorithm��one of the simplest and most popular unsupervised machine learning (ML) algorithms for data scientists.]]>

K-means is a clustering algorithm��one of the simplest and most popular unsupervised machine learning (ML) algorithms for data scientists. Two b&w images of a woman in a hat, one image in a higher resolution.

Two b&w images of a woman in a hat, one image in a higher resolution.

K-means is a clustering algorithm��one of the simplest and most popular unsupervised machine learning (ML) algorithms for data scientists.

]]> 0 Yao (Jason) Lu <![CDATA[Visual Language Models on NVIDIA Hardware with VILA]]> http://www.open-lab.net/blog/?p=81571 2025-01-07T04:01:29Z 2024-05-03T15:00:00Z

Note: As of January 6, 2025 VILA is now part of the new Cosmos Nemotron vision language models. Visual language models have evolved significantly recently....]]>

Note: As of January 6, 2025 VILA is now part of the new Cosmos Nemotron vision language models. Visual language models have evolved significantly recently.... Decorative image.

Decorative image.

Note: As of January 6, 2025 VILA is now part of the new Cosmos Nemotron vision language models. Visual language models have evolved significantly recently. However, the existing technology typically only supports one single image. They cannot reason among multiple images, support in context learning or understand videos. Also, they don��t optimize for inference speed. We developed VILA��

]]> 1 Kyle Kranen <![CDATA[Applying Mixture of Experts in LLM Architectures]]> http://www.open-lab.net/blog/?p=79605 2024-06-06T14:53:24Z 2024-03-14T20:01:00Z

Mixture of experts (MoE) large language model (LLM) architectures have recently emerged, both in proprietary LLMs such as GPT-4, as well as in community models...]]>

Mixture of experts (MoE) large language model (LLM) architectures have recently emerged, both in proprietary LLMs such as GPT-4, as well as in community models...

llm-graphic-representation

Mixture of experts (MoE) large language model (LLM) architectures have recently emerged, both in proprietary LLMs such as GPT-4, as well as in community models with the open-source release of Mistral Mixtral 8x7B. The strong relative performance of the Mixtral model has raised much interest and numerous questions about MoE and its use in LLM architectures. So, what is MoE and why is it important?

]]> 0 Paul Springer <![CDATA[cuTENSOR 2.0: Applications and Performance]]> http://www.open-lab.net/blog/?p=77915 2024-04-09T23:45:28Z 2024-03-09T03:20:47Z

While part 1 focused on the usage of the new NVIDIA cuTENSOR 2.0 CUDA math library, this post introduces a variety of usage modes beyond that, specifically...]]>

While part 1 focused on the usage of the new NVIDIA cuTENSOR 2.0 CUDA math library, this post introduces a variety of usage modes beyond that, specifically... Decorative image of matrices on a black background, with the text,

Decorative image of matrices on a black background, with the text,

While part 1 focused on the usage of the new NVIDIA cuTENSOR 2.0 CUDA math library, this post introduces a variety of usage modes beyond that, specifically usage from Python and Julia. We also demonstrate the performance of cuTENSOR based on benchmarks in a number of application domains. This post explores applications and performance benchmarks for cuTENSOR 2.0. For more information��

]]> 0 Paul Springer <![CDATA[cuTENSOR 2.0: A Comprehensive Guide for Accelerating Tensor Computations]]> http://www.open-lab.net/blog/?p=77913 2024-04-09T23:45:29Z 2024-03-09T03:20:45Z

NVIDIA cuTENSOR is a CUDA math library that provides optimized implementations of tensor operations where tensors are dense, multi-dimensional arrays or array...]]>

NVIDIA cuTENSOR is a CUDA math library that provides optimized implementations of tensor operations where tensors are dense, multi-dimensional arrays or array... Decorative image of matrices on a black background, with the text

Decorative image of matrices on a black background, with the text

NVIDIA cuTENSOR is a CUDA math library that provides optimized implementations of tensor operations where tensors are dense, multi-dimensional arrays or array slices. The release of cuTENSOR 2.0 represents a major update��in both functionality and performance��over its predecessor. This version reimagines its APIs to be more expressive, including advanced just-in-time compilation capabilities all��

]]> 0 Tanya Lenz <![CDATA[Event: AI and Data Science Virtual Summit]]> http://www.open-lab.net/blog/?p=71481 2023-11-02T18:14:44Z 2023-10-10T19:30:00Z

Meta, NetworkX, Fast.ai, and other industry leaders share how to gain new insights from your data with emerging tools.]]>

Meta, NetworkX, Fast.ai, and other industry leaders share how to gain new insights from your data with emerging tools.

data-science-summit-graphic

Meta, NetworkX, Fast.ai, and other industry leaders share how to gain new insights from your data with emerging tools.

]]> 0 Tam��s Feh��r <![CDATA[Accelerated Vector Search: Approximating with NVIDIA cuVS Inverted Index]]> http://www.open-lab.net/blog/?p=70772 2024-11-07T05:00:52Z 2023-10-02T18:16:58Z

Performing an exhaustive exact k-nearest neighbor (kNN) search, also known as brute-force search, is expensive, and it doesn��t scale particularly well to...]]>

Performing an exhaustive exact k-nearest neighbor (kNN) search, also known as brute-force search, is expensive, and it doesn��t scale particularly well to...

ivf_blog_search2

Performing an exhaustive exact k-nearest neighbor (kNN) search, also known as brute-force search, is expensive, and it doesn��t scale particularly well to larger datasets. During vector search, brute-force search requires the distance to be calculated between every query vector and database vector. For the frequently used Euclidean and cosine distances, the computation task becomes equivalent to a��

]]> 0 Mickael Ide <![CDATA[Accelerating Vector Search: Fine-Tuning GPU Index Algorithms]]> http://www.open-lab.net/blog/?p=69885 2024-11-18T21:16:14Z 2023-09-11T16:00:00Z

In this post, we dive deeper into each of the GPU-accelerated indexes mentioned in part 1 and give a brief explanation of how the algorithms work, along with a...]]>

In this post, we dive deeper into each of the GPU-accelerated indexes mentioned in part 1 and give a brief explanation of how the algorithms work, along with a...

vector-search-part-2-featured

In this post, we dive deeper into each of the GPU-accelerated indexes mentioned in part 1 and give a brief explanation of how the algorithms work, along with a summary of important parameters to fine-tune their behavior. We then go through a simple end-to-end example to demonstrate cuVS�� Python APIs on a question-and-answer problem with a pretrained large language model and provide a��

]]> 0 Mickael Ide <![CDATA[Accelerating Vector Search: Using GPU-Powered Indexes with NVIDIA cuVS]]> http://www.open-lab.net/blog/?p=69884 2024-11-07T05:04:43Z 2023-09-11T15:59:00Z

In the current AI landscape, vector search is one of the hottest topics due to its applications in large language models (LLM) and generative AI. Semantic...]]>

In the current AI landscape, vector search is one of the hottest topics due to its applications in large language models (LLM) and generative AI. Semantic...

vector-search-part-1-featured

In the current AI landscape, vector search is one of the hottest topics due to its applications in large language models (LLM) and generative AI. Semantic vector search enables a broad range of important tasks like detecting fraudulent transactions, recommending products to users, using contextual information to augment full-text searches, and finding actors that pose potential security risks.

]]> 0 Jess Nguyen <![CDATA[ICYMI: Unlocking the Power of GPU-Accelerated DataFrames?in Python]]> http://www.open-lab.net/blog/?p=68916 2023-08-24T18:03:51Z 2023-08-04T16:00:00Z

Read this tutorial on how to tap into GPUs by importing cuDF instead of pandas�Cwith only a few code changes.]]>

Read this tutorial on how to tap into GPUs by importing cuDF instead of pandas�Cwith only a few code changes. An illustration with 3 different colored squares labeled GPUs in a row.

An illustration with 3 different colored squares labeled GPUs in a row.

Read this tutorial on how to tap into GPUs by importing cuDF instead of pandas�Cwith only a few code changes.

]]> 0 Eryk Lewinson <![CDATA[A Comprehensive Guide on Interaction Terms in Time Series Forecasting]]> http://www.open-lab.net/blog/?p=68209 2025-05-07T22:43:04Z 2023-07-20T17:00:00Z

Modeling time series data can be challenging (and fascinating) due to its inherent complexity and unpredictability. Long-term trends in time series can change...]]>

Modeling time series data can be challenging (and fascinating) due to its inherent complexity and unpredictability. Long-term trends in time series can change...

office-people-graphic

Modeling time series data can be challenging (and fascinating) due to its inherent complexity and unpredictability. Long-term trends in time series can change drastically due to certain events, for example. Recall the beginning of the global pandemic, when businesses such as airlines or brick-and-mortar shops saw a quick decline in the number of customers and sales. In contrast��

]]> 0 Zohim Chandani <![CDATA[Programming the Quantum-Classical Supercomputer]]> http://www.open-lab.net/blog/?p=68044 2024-05-07T19:30:16Z 2023-07-19T16:00:00Z

Heterogeneous computing architectures��those that incorporate a variety of processor types working in tandem��have proven extremely valuable in the continued...]]>

Heterogeneous computing architectures��those that incorporate a variety of processor types working in tandem��have proven extremely valuable in the continued... Illustration of a DGX GH200.

Illustration of a DGX GH200.

Heterogeneous computing architectures��those that incorporate a variety of processor types working in tandem��have proven extremely valuable in the continued scalability of computational workloads in AI, machine learning (ML), quantum physics, and general data science. Critical to this development has been the ability to abstract away the heterogeneous architecture and promote a framework that��

]]> 0 Jay Rodge <![CDATA[Accelerated Data Analytics: Machine Learning with GPU-Accelerated Pandas and Scikit-learn]]> http://www.open-lab.net/blog/?p=67937 2024-05-15T16:11:39Z 2023-07-11T20:00:00Z

If you are looking to take your machine learning (ML) projects to new levels of speed and scalability, GPU-accelerated data analytics can help you deliver...]]>

If you are looking to take your machine learning (ML) projects to new levels of speed and scalability, GPU-accelerated data analytics can help you deliver... Decorative image.

Decorative image.

If you are looking to take your machine learning (ML) projects to new levels of speed and scalability, GPU-accelerated data analytics can help you deliver insights quickly with breakthrough performance. From faster computation to efficient model training, GPUs bring many benefits to everyday ML tasks. Update: The below blog describes how to use GPU-only RAPIDS cuDF��

]]> 0 Jess Nguyen <![CDATA[ICYMI: Exploring Challenges Posed by Biased Datasets Using RAPIDS cuDF]]> http://www.open-lab.net/blog/?p=67283 2023-07-13T19:00:23Z 2023-06-28T19:25:19Z

Read about an innovative GPU solution that solves limitations using small biased datasets with RAPIDS cuDF.]]>

Read about an innovative GPU solution that solves limitations using small biased datasets with RAPIDS cuDF. Several graph illustrations representing data science.

Several graph illustrations representing data science.

Read about an innovative GPU solution that solves limitations using small biased datasets with RAPIDS cuDF.

]]> 0 Severin Dicks <![CDATA[GPU-Accelerated Single-Cell RNA Analysis with RAPIDS-singlecell]]> http://www.open-lab.net/blog/?p=67047 2023-07-24T15:50:05Z 2023-06-27T14:00:00Z

Single-cell sequencing has become one of the most prominent technologies used in biomedical research. Its ability to decipher changes in the transcriptome and...]]>

Single-cell sequencing has become one of the most prominent technologies used in biomedical research. Its ability to decipher changes in the transcriptome and...

rapids-singlecell-abstract

Single-cell sequencing has become one of the most prominent technologies used in biomedical research. Its ability to decipher changes in the transcriptome and epigenome on a cell level has enabled researchers to gain valuable new insights. As a result, single-cell experiments have grown in size and complexity by a factor of over 100, with experiments involving more than 1 million cells becoming��

]]> 0 Jiahui Huang <![CDATA[Recreate High-Fidelity Digital Twins with Neural Kernel Surface Reconstruction]]> http://www.open-lab.net/blog/?p=66304 2023-10-20T18:01:33Z 2023-06-09T13:00:00Z

Reconstructing a smooth surface from a point cloud is a fundamental step in creating digital twins of real-world objects and scenes. Algorithms for surface...]]>

Reconstructing a smooth surface from a point cloud is a fundamental step in creating digital twins of real-world objects and scenes. Algorithms for surface... NKSR statue model

NKSR statue model

Reconstructing a smooth surface from a point cloud is a fundamental step in creating digital twins of real-world objects and scenes. Algorithms for surface reconstruction appear in various applications, such as industrial simulation, video game development, architectural design, medical imaging, and robotics. Neural Kernel Surface Reconstruction (NKSR) is the new NVIDIA algorithm for��

]]> 1 Andrew Briand <![CDATA[Limit Order Book Dataset Generation for Accelerated Short-Term Price Prediction with RAPIDS]]> http://www.open-lab.net/blog/?p=64676 2023-06-27T16:06:02Z 2023-05-19T17:00:00Z

In the high-frequency trading world, thousands of market participants interact daily. In fact, high-frequency trading accounts for more than half of the US...]]>

In the high-frequency trading world, thousands of market participants interact daily. In fact, high-frequency trading accounts for more than half of the US... Stock board

Stock board

In the high-frequency trading world, thousands of market participants interact daily. In fact, high-frequency trading accounts for more than half of the US equity trading volume, according to the paper High-Frequency Trading Synchronizes Prices in Financial Markets. Market makers are the big players on the sell side who provide liquidity in the market. Speculators are on the buy side��

]]> 0 Tom Lubowe <![CDATA[QHack Results Highlight Quantum Computing Applications and Tools on GPUs]]> http://www.open-lab.net/blog/?p=64781 2024-05-07T19:30:32Z 2023-05-18T19:00:00Z

QHack is an educational conference and the world��s largest quantum machine learning (QML) hackathon. This year at QHack 2023, 2,850 individuals from 105...]]>

QHack is an educational conference and the world��s largest quantum machine learning (QML) hackathon. This year at QHack 2023, 2,850 individuals from 105... cuQuantum graphic

cuQuantum graphic

QHack is an educational conference and the world��s largest quantum machine learning (QML) hackathon. This year at QHack 2023, 2,850 individuals from 105 different countries competed for 8 days to build the most innovative solutions for quantum computing applications using NVIDIA quantum technology. The event was organized by Xanadu, with NVIDIA sponsoring the QHack 2023 NVIDIA Challenge.

]]> 0 Eryk Lewinson <![CDATA[A Comprehensive Guide to Interaction Terms in Linear Regression]]> http://www.open-lab.net/blog/?p=63780 2023-06-09T22:27:33Z 2023-04-26T17:00:00Z

Linear regression is a powerful statistical tool used to model the relationship between a dependent variable and one or more independent variables (features)....]]>

Linear regression is a powerful statistical tool used to model the relationship between a dependent variable and one or more independent variables (features)....

A Comprehensive Guide to Interaction Terms in Linear Regression

Linear regression is a powerful statistical tool used to model the relationship between a dependent variable and one or more independent variables (features). An important, and often forgotten, concept in regression analysis is that of interaction terms. In short, interaction terms enable you to examine whether the relationship between the target and the independent variable changes depending on��

]]> 1 Eryk Lewinson <![CDATA[A Comprehensive Overview of Regression Evaluation Metrics]]> http://www.open-lab.net/blog/?p=63623 2023-07-11T23:19:05Z 2023-04-20T15:00:00Z

As a data scientist, evaluating machine learning model performance is a crucial aspect of your work. To do so effectively, you have a wide range of statistical...]]>

As a data scientist, evaluating machine learning model performance is a crucial aspect of your work. To do so effectively, you have a wide range of statistical... Stylized image of a line chart with a magnifying glass next to it.

Stylized image of a line chart with a magnifying glass next to it.

As a data scientist, evaluating machine learning model performance is a crucial aspect of your work. To do so effectively, you have a wide range of statistical metrics at your disposal, each with its own unique strengths and weaknesses. By developing a solid understanding of these metrics, you are not only better equipped to choose the best one for optimizing your model but also to explain your��

]]> 1 Eryk Lewinson <![CDATA[Dealing with Outliers Using Three Robust Linear Regression Models]]> http://www.open-lab.net/blog/?p=49692 2023-06-12T09:23:51Z 2022-07-20T16:30:00Z

Linear regression is one of the simplest machine learning models out there. It is often the starting point not only for learning about data science but also for...]]>

Linear regression is one of the simplest machine learning models out there. It is often the starting point not only for learning about data science but also for... Photo by Ricardo Gomez Angel on Unsplash

Photo by Ricardo Gomez Angel on Unsplash

Linear regression is one of the simplest machine learning models out there. It is often the starting point not only for learning about data science but also for building quick and simple minimum viable products (MVPs), which then serve as benchmarks for more complex algorithms. In general, linear regression fits a line (in two dimensions) or a hyperplane (in three and more dimensions) that��

]]> 0 Danping Peng <![CDATA[Accelerating High-Volume Manufacturing for Inverse Lithography Technology]]> http://www.open-lab.net/blog/?p=47088 2023-05-24T00:19:13Z 2022-05-06T20:34:25Z

Inverse lithography technology (ILT) was first implemented and demonstrated in early 2003. It was created by Danping Peng, while he worked as an engineer...]]>

Inverse lithography technology (ILT) was first implemented and demonstrated in early 2003. It was created by Danping Peng, while he worked as an engineer...

ilt-featured

Inverse lithography technology (ILT) was first implemented and demonstrated in early 2003. It was created by Danping Peng, while he worked as an engineer at Luminescent Technologies Inc., a startup company founded by professors Stanley Osher and Eli Yabonovitch from UCLA and entrepreneurs Dan Abrams and Jack Herrik. At that time, ILT was a revolutionary solution that showed far superior��

]]> 0 Richmond Alake <![CDATA[Merge Sort Explained: A Data Scientist��s Algorithm Guide]]> http://www.open-lab.net/blog/?p=46176 2023-06-12T20:52:45Z 2022-03-31T18:15:00Z

Data Scientists deal with algorithms daily. However, the data science discipline as a whole has developed into a role that does not involve implementation of...]]>

Data Scientists deal with algorithms daily. However, the data science discipline as a whole has developed into a role that does not involve implementation of...

MergeSortExplained_Featured Image

Data Scientists deal with algorithms daily. However, the data science discipline as a whole has developed into a role that does not involve implementation of sophisticated algorithms. Nonetheless, practitioners can still benefit from building an understanding and repertoire of algorithms. In this article, the sorting algorithm merge sort is introduced, explained, evaluated, and implemented.

]]> 0 Edward Krueger <![CDATA[Natural Language Processing First Steps: How Algorithms Understand Text]]> http://www.open-lab.net/blog/?p=43295 2022-08-21T23:53:19Z 2022-01-20T18:00:00Z

This article will discuss how to prepare text through vectorization, hashing, tokenization, and other techniques, to be compatible with machine learning (ML)...]]>

This article will discuss how to prepare text through vectorization, hashing, tokenization, and other techniques, to be compatible with machine learning (ML)...

NLP_Edward.K

This article will discuss how to prepare text through vectorization, hashing, tokenization, and other techniques, to be compatible with machine learning (ML) and other numerical algorithms. I��ll explain and demonstrate the process. Natural language processing (NLP) applies machine learning (ML) and other techniques to language. However, machine learning and other techniques typically work on��

]]> 0 Richmond Alake <![CDATA[Insertion Sort Explained�CA Data Scientists Algorithm Guide]]> http://www.open-lab.net/blog/?p=41405 2022-08-21T23:53:07Z 2021-12-09T16:30:00Z

Algorithms are commonplace in the world of data science and machine learning. Algorithms power social media applications, Google search results, banking systems...]]>

Algorithms are commonplace in the world of data science and machine learning. Algorithms power social media applications, Google search results, banking systems...

Sorting Algorithms_Featured Image

Algorithms are commonplace in the world of data science and machine learning. Algorithms power social media applications, Google search results, banking systems and plenty more. Therefore, it��s paramount that Data Scientists and machine-learning practitioners have an intuition for analyzing, designing, and implementing algorithms. Efficient algorithms have saved companies millions of dollars��

]]> 0 Nefi Alarcon <![CDATA[Facebook��s AI Model Outmatches Competitors in Poker]]> https://news.www.open-lab.net/?p=17644 2022-08-21T23:50:10Z 2020-08-03T16:33:14Z

Facebook researchers developed a reinforcement learning model that can outmatch human competitors in heads-up, no-limit Texas hold��em, and turn endgame...]]>

Facebook researchers developed a reinforcement learning model that can outmatch human competitors in heads-up, no-limit Texas hold��em, and turn endgame...

facebook_poker

Facebook researchers developed a reinforcement learning model that can outmatch human competitors in heads-up, no-limit Texas hold��em, and turn endgame hold��em poker. At the heart of the model is how software-agents handle perfect-information games such as chess, versus imperfect-information games like poker. Instead of just deciding on its next move, a reinforcement learning software��

]]> 0 Nefi Alarcon <![CDATA[Allen Institute for AI Announces BERT-Breakthrough: Passing a 12th-Grade Science Exam]]> https://news.www.open-lab.net/?p=14831 2022-08-21T23:48:31Z 2019-09-05T20:35:36Z

Recently the Allen Institute for Artificial Intelligence announced a breakthrough for a BERT-based model, passing a 12th-grade science test. The GPU-accelerated...]]>

Recently the Allen Institute for Artificial Intelligence announced a breakthrough for a BERT-based model, passing a 12th-grade science test. The GPU-accelerated...

Allen_AI_Aristo

Recently the Allen Institute for Artificial Intelligence announced a breakthrough for a BERT-based model, passing a 12th-grade science test. The GPU-accelerated system called Aristo can read, learn, and reason about science, in this case emulating the decision making of students. For this milestone, Aristo answered more than 90 percent of the questions on an eighth-grade science exam correctly��

]]> 0 Nefi Alarcon <![CDATA[Fabula AI Develops A New Algorithm to Stop Fake News]]> https://news.www.open-lab.net/?p=12722 2022-08-21T23:47:02Z 2019-02-12T00:09:52Z

London based startup Fabula AI has developed a deep learning-based system that can help identify fake news across online platforms.? ��Automatically detecting...]]>

London based startup Fabula AI has developed a deep learning-based system that can help identify fake news across online platforms.? ��Automatically detecting...

Fabula

London based startup Fabula AI has developed a deep learning-based system that can help identify fake news across online platforms. ��Automatically detecting fake news poses challenges that defy existing approaches based on linguistic content analysis,�� the company stated in a blog post. ��News is often highly nuanced and their interpretation requires the knowledge of political or social context��

]]> 0 Mark Harris <![CDATA[Cooperative Groups: Flexible CUDA Thread Programming]]> http://www.open-lab.net/blog/parallelforall/?p=8415 2023-06-12T21:16:47Z 2017-10-05T04:17:43Z

In efficient parallel algorithms, threads cooperate and share data to perform collective computations. To share data, the threads must synchronize. The...]]>

In efficient parallel algorithms, threads cooperate and share data to perform collective computations. To share data, the threads must synchronize. The...

AO0zQhrL2Kpvi1Z6sCB4rr6-_faEEtnNgphE1ewGgDeKOIkOocFSBe-elSGLs92pa19Zbgs2048.png

In efficient parallel algorithms, threads cooperate and share data to perform collective computations. To share data, the threads must synchronize. The granularity of sharing varies from algorithm to algorithm, so thread synchronization should be flexible. Making synchronization an explicit part of the program ensures safety, maintainability, and modularity. CUDA 9 introduces Cooperative Groups��

]]> 32 Brad Nemire <![CDATA[Pro Tip: cuBLAS Strided Batched Matrix Multiply]]> https://news.www.open-lab.net/?p=8219 2022-08-21T23:43:06Z 2017-02-28T23:40:16Z

There��s a new computational workhorse in town. For decades, general matrix-matrix multiply��known as GEMM in Basic Linear Algebra Subroutines (BLAS)...]]>

There��s a new computational workhorse in town. For decades, general matrix-matrix multiply��known as GEMM in Basic Linear Algebra Subroutines (BLAS)...

cublas

There��s a new computational workhorse in town. For decades, general matrix-matrix multiply��known as GEMM in Basic Linear Algebra Subroutines (BLAS) libraries��has been a standard benchmark for computational performance. GEMM is possibly the most optimized and widely used routine in scientific computing. Expert implementations are available for every architecture and quickly achieve the peak��

]]> 0 Brad Nemire <![CDATA[Google Releases TensorFlow 1.0]]> https://news.www.open-lab.net/?p=8184 2022-08-21T23:43:04Z 2017-02-16T23:19:25Z

Google recently announced the release of version 1.0 of its TensorFlow deep learning framework at their inaugural TensorFlow Developer Summit. In just its first...]]>

Google recently announced the release of version 1.0 of its TensorFlow deep learning framework at their inaugural TensorFlow Developer Summit. In just its first...

TensorFlow main

Google recently announced the release of version 1.0 of its TensorFlow deep learning framework at their inaugural TensorFlow Developer Summit. In just its first year, the popular framework has helped researchers make progress with everything from language translation to early detection of skin cancer and preventing blindness in diabetics. The first major version comes with some fantastic new��

]]> 0 Brad Nemire <![CDATA[GPU-Accelerated PC Solves Complex Problems Hundreds of Times Faster Than Massive CPU-only Supercomputers]]> https://news.www.open-lab.net/?p=7578 2022-08-21T23:42:26Z 2016-07-19T20:47:18Z

Russian scientists from Lomonosov Moscow State University used an ordinary GPU-accelerated desktop computer to solve complex quantum mechanics equations in just...]]>

Russian scientists from Lomonosov Moscow State University used an ordinary GPU-accelerated desktop computer to solve complex quantum mechanics equations in just...

PC Russia

Russian scientists from Lomonosov Moscow State University used an ordinary GPU-accelerated desktop computer to solve complex quantum mechanics equations in just 15 minutes that would typically take two to three days on a large CPU-only supercomputer. Senior researchers Vladimir Pomerantcev and Olga Rubtsova and professor Vladimir Kukulin used a GeForce GTX 670 with CUDA and the PGI CUDA Fortran��

]]> 0 Brad Nemire <![CDATA[Share Your Science: Finding Interesting Statistics from Massive Datasets]]> https://news.www.open-lab.net/?p=7317 2022-08-21T23:42:12Z 2016-04-27T23:08:44Z

Adam McLaughlin, PhD student at Georgia Tech shares how he is using NVIDIA Tesla GPUs for his research on Betweenness Centrality �C a graph analytics algorithm...]]>

Adam McLaughlin, PhD student at Georgia Tech shares how he is using NVIDIA Tesla GPUs for his research on Betweenness Centrality �C a graph analytics algorithm...

Adam McLaughlin_featured image

Adam McLaughlin, PhD student at Georgia Tech shares how he is using NVIDIA Tesla GPUs for his research on Betweenness Centrality �C a graph analytics algorithm that tracks the most important vertices within a network. This can be applied to a broad range of applications, such as finding the head of a crime ring or determining the best location for a store within a city. Using a cluster of GPUs for��

]]> 0 Brad Nemire <![CDATA[Share Your Science: Pushing the Limits of Computational Photography]]> https://news.www.open-lab.net/?p=7218 2022-08-21T23:37:58Z 2016-04-06T06:02:05Z

Daniel Ambrosi, Artist and Photographer, is using NVIDIA GPUs in the Amazon cloud and CUDA to create giant 2D-stitched HDR panoramas called ��Dreamscapes.��...]]>

Daniel Ambrosi, Artist and Photographer, is using NVIDIA GPUs in the Amazon cloud and CUDA to create giant 2D-stitched HDR panoramas called ��Dreamscapes.��...

Daniel Ambrosi Dreamscapes

Daniel Ambrosi, Artist and Photographer, is using NVIDIA GPUs in the Amazon cloud and CUDA to create giant 2D-stitched HDR panoramas called ��Dreamscapes.�� Ambrosi applies a modified version of Google��s DeepDream neural net visualization code to his original panoramic landscape images to create truly one-of-a-kind pieces of art. For more information visit http://www.danielambrosi.com/

]]> 0 Nikolay Sakharnykh <![CDATA[High-Performance Geometric Multi-Grid with GPU Acceleration]]> http://www.open-lab.net/blog/parallelforall/?p=6313 2023-02-10T22:34:08Z 2016-02-23T10:11:05Z

Linear solvers are probably the most common tool in scientific computing applications. There are two basic classes of methods that can be used to solve an...]]>

Linear solvers are probably the most common tool in scientific computing applications. There are two basic classes of methods that can be used to solve an...

hpgmg_featured3

Linear solvers are probably the most common tool in scientific computing applications. There are two basic classes of methods that can be used to solve an equation: direct and iterative. Direct methods are usually robust, but have additional computational complexity and memory capacity requirements. Unlike direct solvers, iterative solvers require minimal memory overhead and feature better��

]]> 5 Brad Nemire <![CDATA[Autonomous Robot Will Iron Your Clothes]]> https://news.www.open-lab.net/?p=7086 2022-08-21T23:41:59Z 2016-02-21T19:01:54Z

Columbia University researchers have created a robotic system that detects wrinkles and then irons the piece of cloth autonomously. Their paper highlights the...]]>

Columbia University researchers have created a robotic system that detects wrinkles and then irons the piece of cloth autonomously. Their paper highlights the...

Robot Iron Clothes

Columbia University researchers have created a robotic system that detects wrinkles and then irons the piece of cloth autonomously. Their paper highlights the ironing process is the final step needed in their ��pipeline�� of a robot picking up a wrinkled shirt, then laying it on the table and lastly, folding the shirt with robotic arms. A GeForce GTX 770 GPU was used for their ��wrinkle analysis��

]]> 0 Brad Nemire <![CDATA[Cutting Edge Parallel Algorithms Research with CUDA]]> http://www.open-lab.net/blog/parallelforall/?p=6004 2022-08-21T23:37:38Z 2015-10-20T05:21:58Z

Leyuan Wang, a Ph.D. student in the UC Davis Department of Computer Science, presented one of only two ��Distinguished Papers�� of the 51 accepted at Euro-Par...]]>

Leyuan Wang, a Ph.D. student in the UC Davis Department of Computer Science, presented one of only two ��Distinguished Papers�� of the 51 accepted at Euro-Par...

gpu_computing_spotlight_358x230

Leyuan Wang, a Ph.D. student in the UC Davis Department of Computer Science, presented one of only two ��Distinguished Papers�� of the 51 accepted at Euro-Par 2015. Euro-Par is a European conference devoted to all aspects of parallel and distributed processing held August 24-28 at Austria��s Vienna University of Technology. Leyuan��s paper Fast Parallel Suffix Array on the GPU, co-authored by her��

]]> 3 Elmar Westphal <![CDATA[Voting and Shuffling to Optimize Atomic Operations]]> http://www.open-lab.net/blog/parallelforall/?p=5700 2022-08-21T23:37:36Z 2015-08-06T07:24:19Z

2iSome years ago I started work on my first CUDA implementation of the Multiparticle Collision Dynamics (MPC) algorithm, a particle-in-cell code used to...]]>

2iSome years ago I started work on my first CUDA implementation of the Multiparticle Collision Dynamics (MPC) algorithm, a particle-in-cell code used to...

GPUProTip_179x115

2iSome years ago I started work on my first CUDA implementation of the Multiparticle Collision Dynamics (MPC) algorithm, a particle-in-cell code used to simulate hydrodynamic interactions between solvents and solutes. As part of this algorithm, a number of particle parameters are summed to calculate certain cell parameters. This was in the days of the Tesla GPU architecture (such as GT200 GPUs��

]]> 0 Nikolay Sakharnykh <![CDATA[GPU Pro Tip: Fast Histograms Using Shared Atomics on Maxwell]]> http://www.open-lab.net/blog/parallelforall/?p=4175 2025-05-02T18:04:42Z 2015-03-17T16:34:16Z

Histograms are an important data representation with many applications in computer vision, data analytics and medical imaging. A histogram is a graphical...]]>

Histograms are an important data representation with many applications in computer vision, data analytics and medical imaging. A histogram is a graphical... GPU Pro Tip

GPU Pro Tip

Histograms are an important data representation with many applications in computer vision, data analytics and medical imaging. A histogram is a graphical representation of the data distribution across predefined bins. The input data set and the number of bins can vary greatly depending on the domain, so let��s focus on one of the most common use cases: an image histogram using 256 bins for each��

]]> 10 Andy Adinets <![CDATA[CUDA Pro Tip: Optimized Filtering with Warp-Aggregated Atomics]]> http://www.open-lab.net/blog/parallelforall/?p=3906 2022-08-21T23:37:27Z 2014-10-02T05:57:09Z

Note: This post has been updated (November 2017) for CUDA 9 and the latest GPUs. The NVCC compiler now performs warp aggregation for atomics automatically in...]]>

Note: This post has been updated (November 2017) for CUDA 9 and the latest GPUs. The NVCC compiler now performs warp aggregation for atomics automatically in... GPU Pro Tip

GPU Pro Tip

Note: This post has been updated (November 2017) for CUDA 9 and the latest GPUs. The NVCC compiler now performs warp aggregation for atomics automatically in many cases, so you can get higher performance with no extra effort. In fact, the code generated by the compiler is actually faster than the manually-written warp aggregation code. This post is mainly intended for those who want to learn how��

]]> 8 Justin Luitjens <![CDATA[Faster Parallel Reductions on Kepler]]> http://www.open-lab.net/blog/parallelforall/?p=2551 2022-08-21T23:37:02Z 2014-02-14T04:30:44Z

Parallel reduction is a common building block for many parallel algorithms. A?presentation from 2007 by Mark Harris?provided a detailed strategy for...]]>

Parallel reduction is a common building block for many parallel algorithms. A?presentation from 2007 by Mark Harris?provided a detailed strategy for...

Kepler_reductions_thumb

Parallel reduction is a common building block for many parallel algorithms. A presentation from 2007 by Mark Harris provided a detailed strategy for implementing parallel reductions on GPUs, but this 6-year old document bears updating. In this post I will show you some features of the Kepler GPU architecture which make reductions even faster: the shuffle (SHFL) instruction and fast device memory��

]]> 53 Tero Karras https://research.nvidia.com/users/tero-karras <![CDATA[Thinking Parallel, Part III: Tree Construction on the GPU]]> http://www.parallelforall.com/?p=635 2022-08-21T23:36:49Z 2012-12-20T05:15:09Z

In part II?of this series, we looked at hierarchical tree traversal as a means of quickly identifying pairs of potentially colliding 3D objects and?we...]]>

In part II?of this series, we looked at hierarchical tree traversal as a means of quickly identifying pairs of potentially colliding 3D objects and?we...

fig06-numbering

In part II of this series, we looked at hierarchical tree traversal as a means of quickly identifying pairs of potentially colliding 3D objects and we demonstrated how optimizing for low divergence can result in substantial performance gains on massively parallel processors. Having a fast traversal algorithm is not very useful, though, unless we also have a tree to go with it. In this part��

]]> 28 Tero Karras https://research.nvidia.com/users/tero-karras <![CDATA[Thinking Parallel, Part II: Tree Traversal on the GPU]]> http://www.parallelforall.com/?p=632 2022-08-21T23:36:49Z 2012-11-27T04:06:33Z

In the first part of this series, we looked at collision detection on the GPU and discussed two commonly used algorithms that find potentially colliding pairs...]]>

In the first part of this series, we looked at collision detection on the GPU and discussed two commonly used algorithms that find potentially colliding pairs...

fig03-bvh

In the first part of this series, we looked at collision detection on the GPU and discussed two commonly used algorithms that find potentially colliding pairs in a set of 3D objects using their axis-aligned bounding boxes (AABBs). Each of the two algorithms has its weaknesses: sort and sweep suffers from high execution divergence, while uniform grid relies on too many simplifying assumptions that��

]]> 6 Tero Karras https://research.nvidia.com/users/tero-karras <![CDATA[Thinking Parallel, Part I: Collision Detection on the GPU]]> http://test.markmark.net/?p=333 2022-08-21T23:36:47Z 2012-11-12T14:53:37Z

This series of posts aims to highlight some of the main differences between conventional programming and parallel programming on the algorithmic level, using...]]>

This series of posts aims to highlight some of the main differences between conventional programming and parallel programming on the algorithmic level, using...

fig01-sort-and-sweep

This series of posts aims to highlight some of the main differences between conventional programming and parallel programming on the algorithmic level, using broad-phase collision detection as an example. The first part will give some background, discuss two commonly used approaches, and introduce the concept of divergence. The second part will switch gears to hierarchical tree traversal in order��

]]> 1 Mark Harris <![CDATA[Expressive Algorithmic Programming with Thrust]]> http://www.parallelforall.com/?p=29 2022-10-10T18:41:21Z 2012-06-06T00:06:11Z

Thrust is a parallel algorithms library which resembles the C++ Standard Template Library (STL). Thrust's High-Level interface greatly enhances...]]>

Thrust is a parallel algorithms library which resembles the C++ Standard Template Library (STL). Thrust's High-Level interface greatly enhances...

thrust_logo

]]> 2 Mark Harris <![CDATA[Accelerated Solution of Sparse Linear Systems]]> http://www.parallelforall.com/?p=1031 2023-06-12T21:17:08Z 2011-06-23T04:09:18Z

Fresh from the NVIDIA Numeric Libraries Team, a white paper illustrating the use of the CUSPARSE and CUBLAS libraries to achieve a 2x speedup of incomplete-LU-...]]>

Fresh from the NVIDIA Numeric Libraries Team, a white paper illustrating the use of the CUSPARSE and CUBLAS libraries to achieve a 2x speedup of incomplete-LU-...

Parallel Solution of Sparse Triangular Linear Systems

Fresh from the NVIDIA Numeric Libraries Team, a white paper illustrating the use of the CUSPARSE and CUBLAS libraries to achieve a 2x speedup of incomplete-LU- and Cholesky-preconditioned iterative methods. The paper focuses on the Bi-Conjugate Gradient and stabilized Conjugate Gradient iterative methods that can be used to solve large sparse non-symmetric and symmetric positive definite linear��

]]> 1 ��˳��97caoporen��