Join NVIDIA at the 16th annual ACM Conference on Recommender Systems.    Learn More

CUDA Primitives Power Data Science on GPUs

NVIDIA provides a suite of machine learning and analytics software libraries to accelerate end-to-end data science pipelines entirely on GPUs. This work is enabled by over 15 years of CUDA development. GPU-accelerated libraries abstract the strengths of low-level CUDA primitives. Numerous libraries like linear algebra, advanced math, and parallelization algorithms lay the foundation for an ecosystem of compute-intensive applications.

With NVIDIA’s libraries, you get highly efficient implementations of algorithms that are regularly extended and optimized. Whether you are building a new application or trying to speed up an existing application, NVIDIA’s libraries provide the easiest way to get started with GPUs. You can download NVIDIA CUDA-X AI libraries as part of the CUDA Toolkit and NVIDIA RAPIDS.

Linear Algebra and Math libraries


A fast GPU-accelerated implementation of the standard basic linear algebra subroutines (BLAS)


Provides GPU-accelerated basic linear algebra subroutines for sparse matrices


A collection of dense and sparse direct solvers to accelerate Linear Optimization applications and more

Parallel Algorithm Libraries


Implements multi-GPU and multi-node collective communication primitives that are performance optimized for NVIDIA GPUs


Provides a flexible, high-level interface for GPU programming to enhance developer productivity


Much of the new data science developer work is focused on hardening an open source project called RAPIDS. RAPIDS, part of CUDA-X AI, relies on NVIDIA® CUDA® primitives for low-level compute optimization, but exposes that GPU parallelism and high-bandwidth memory speed through user-friendly Python interfaces.

RAPIDS also focuses on common data preparation tasks for ETL, analytics and machine learning. This includes a familiar DataFrame API that integrates with a variety of machine learning algorithms for end-to-end pipeline accelerations without paying typical serialization costs.

RAPIDS Workflow


  • ANALYTICS and ETL - cuDF is a DataFrame manipulation library based on Apache Arrow that accelerates loading, filtering, and manipulation of data for model training data preparation. The Python bindings of the core-accelerated CUDA DataFrame manipulation primitives mirror the pandas interface for seamless onboarding of pandas users.
  • MACHINE LEARNING - cuML is a collection of GPU-accelerated machine learning libraries that will provide GPU versions of all machine learning algorithms available in scikit-learn.
  • GRAPH ANALYTICS - cuGRAPH is a collection of graph analytics libraries that seamlessly integrate into the RAPIDS data science platform.

RAPIDS Features

Hassle-Free Integration

Accelerate your Python data science toolchain with minimal code changes and no new tools to learn.

Top Model Accuracy

Increase machine learning model accuracy by iterating on models faster and deploying them more frequently.

Reduced Training Time

Drastically improve your productivity with near-interactive data science.

Open Source

Customizable, extensible, interoperable - the open-source software is supported by NVIDIA and built on Apache Arrow.

Get Started

Experience the accelerated machine learning and data science on GPUs with RAPIDS.

RAPIDS Webpage