pandas – NVIDIA Technical Blog

pandas – NVIDIA Technical Blog News and tutorials for developers, data scientists, and IT admins 2025-07-03T22:20:47Z http://www.open-lab.net/blog/feed/ Allison Ding <![CDATA[Get Started with GPU Acceleration for Data Science]]> http://www.open-lab.net/blog/?p=95894 2025-04-23T02:52:30Z 2025-02-06T23:07:48Z

In data science, operational efficiency is key to handling increasingly complex and large datasets. GPU acceleration has become essential for modern workflows,...]]>

In data science, operational efficiency is key to handling increasingly complex and large datasets. GPU acceleration has become essential for modern workflows,...

computer-screen-data

In data science, operational efficiency is key to handling increasingly complex and large datasets. GPU acceleration has become essential for modern workflows, offering significant performance improvements. RAPIDS is a suite of open-source libraries and frameworks developed by NVIDIA, designed to accelerate data science pipelines using GPUs with minimal code changes.

]]> 0 Bradley Dice <![CDATA[Supercharging Deduplication in pandas Using RAPIDS cuDF]]> http://www.open-lab.net/blog/?p=92703 2024-12-12T19:38:34Z 2024-11-28T14:00:00Z

A common operation in data analytics is to drop duplicate rows. Deduplication is critical in Extract, Transform, Load (ETL) workflows, where you might want to...]]>

A common operation in data analytics is to drop duplicate rows. Deduplication is critical in Extract, Transform, Load (ETL) workflows, where you might want to...

green-background-white-points

]]> 0 Gregory Kimball <![CDATA[Scaling Up to One Billion Rows of Data in pandas using RAPIDS cuDF]]> http://www.open-lab.net/blog/?p=88761 2024-09-25T17:26:00Z 2024-09-11T16:54:53Z

The One Billion Row Challenge is a fun benchmark to showcase basic data processing operations. It was originally launched as a pure-Java competition, and has...]]>

The One Billion Row Challenge is a fun benchmark to showcase basic data processing operations. It was originally launched as a pure-Java competition, and has...

image3 (2)

The One Billion Row Challenge is a fun benchmark to showcase basic data processing operations. It was originally launched as a pure-Java competition, and has gathered a community of developers in other languages, including Python, Rust, Go, Swift, and more. The challenge has been useful for many software engineers with an interest in exploring the details of text file reading��

]]> 0 Prachi Goel <![CDATA[Just Released: RAPIDS 24.08]]> http://www.open-lab.net/blog/?p=88370 2024-09-05T17:57:13Z 2024-08-29T16:00:58Z

RAPIDS 24.08 is now available with significant updates geared towards processing larger workloads and seamless CPU/GPU interoperability.]]>

RAPIDS 24.08 is now available with significant updates geared towards processing larger workloads and seamless CPU/GPU interoperability.

RAPIDS 24.08 is now available with significant updates geared towards processing larger workloads and seamless CPU/GPU interoperability.

]]> 0 Th��o Viel <![CDATA[Build Efficient Recommender Systems with Co-Visitation Matrices and RAPIDS cuDF]]> http://www.open-lab.net/blog/?p=86997 2024-09-05T17:57:26Z 2024-08-21T20:30:00Z

Recommender systems play a crucial role in personalizing user experiences across various platforms. These systems are designed to predict and suggest items that...]]>

Recommender systems play a crucial role in personalizing user experiences across various platforms. These systems are designed to predict and suggest items that...

rapids-cudf-pandas-acceleration-graphic

Recommender systems play a crucial role in personalizing user experiences across various platforms. These systems are designed to predict and suggest items that users are likely to interact with, based on their past behavior and preferences. Building an effective recommender system involves understanding and leveraging huge, complex datasets that capture interactions between users and items.

]]> 0 Prachi Goel <![CDATA[RAPIDS cuDF Unified Memory Accelerates pandas up to 30x on Large Datasets]]> http://www.open-lab.net/blog/?p=87019 2024-08-22T18:25:33Z 2024-08-09T16:00:00Z

NVIDIA has released RAPIDS cuDF unified memory and text data processing features that help data scientists continue to use pandas when working with larger and...]]>

NVIDIA has released RAPIDS cuDF unified memory and text data processing features that help data scientists continue to use pandas when working with larger and...

laptop-data-science

NVIDIA has released RAPIDS cuDF unified memory and text data processing features that help data scientists continue to use pandas when working with larger and text-heavy datasets in demanding workloads. Data scientists can now accelerate these workloads by up to 30x. RAPIDS is a collection of open-source GPU-accelerated data science and AI libraries. cuDF is a Python GPU DataFrame library for��

]]> 0 Sheilah Kirui <![CDATA[RAPIDS on Databricks: A Guide to GPU-Accelerated Data Processing]]> http://www.open-lab.net/blog/?p=82441 2024-05-30T19:55:56Z 2024-05-14T20:30:00Z

In today's data-driven landscape, maximizing performance and efficiency in data processing and analytics is critical. While many Databricks users are familiar...]]>

In today's data-driven landscape, maximizing performance and efficiency in data processing and analytics is critical. While many Databricks users are familiar...

laptop-displying-data

In today��s data-driven landscape, maximizing performance and efficiency in data processing and analytics is critical. While many Databricks users are familiar with using GPU clusters for machine learning training, there��s a vast opportunity to leverage GPU acceleration for data processing and analytics tasks as well. Databricks�� Data Intelligence Platform empowers users to manage both small��

]]> 0 Nick Becker <![CDATA[RAPIDS cuDF Instantly Accelerates pandas up to 50x?on Google Colab]]> http://www.open-lab.net/blog/?p=82534 2024-05-30T19:55:55Z 2024-05-14T20:30:00Z

At Google I/O'24, Laurence Moroney, head of AI Advocacy at Google, announced that RAPIDS cuDF is now integrated into Google Colab. Developers can now instantly...]]>

At Google I/O'24, Laurence Moroney, head of AI Advocacy at Google, announced that RAPIDS cuDF is now integrated into Google Colab. Developers can now instantly... An illustration for RAPIDS.

An illustration for RAPIDS.

At Google I/O��24, Laurence Moroney, head of AI Advocacy at Google, announced that RAPIDS cuDF is now integrated into Google Colab. Developers can now instantly accelerate pandas code up to 50x on Google Colab GPU instances, and continue using pandas as data grows��without sacrificing performance. RAPIDS cuDF is a GPU DataFrame library that accelerates the data processing tool pandas with zero��

]]> 0 Belen Tegegn <![CDATA[Top Data Science Sessions from NVIDIA GTC 2024 Now Available On Demand]]> http://www.open-lab.net/blog/?p=81594 2024-05-02T21:34:01Z 2024-04-29T22:40:06Z

At GTC 2024, experts from NVIDIA and our partners shared insights about GPU-accelerated tools, optimizations, and best practices for data scientists. From the...]]>

At GTC 2024, experts from NVIDIA and our partners shared insights about GPU-accelerated tools, optimizations, and best practices for data scientists. From the... 3 sessions for data scientists to watch from NVIDIA GTC 2024

3 sessions for data scientists to watch from NVIDIA GTC 2024

At GTC 2024, experts from NVIDIA and our partners shared insights about GPU-accelerated tools, optimizations, and best practices for data scientists. From the hundreds of sessions covering various topics, we��ve handpicked the top three data science sessions that you won��t want to miss. RAPIDS in 2024: Accelerated Data Science Everywhere Speakers: Dante Gama Dessavre��

]]> 0 Jay Rodge <![CDATA[RAPIDS cuDF Accelerates pandas Nearly 150x with Zero Code Changes]]> http://www.open-lab.net/blog/?p=72591 2024-05-15T15:55:04Z 2024-03-18T22:00:00Z

At NVIDIA GTC 2024, it was announced that RAPIDS cuDF can now bring GPU acceleration to 9.5M million pandas users without requiring them to change their code....]]>

At NVIDIA GTC 2024, it was announced that RAPIDS cuDF can now bring GPU acceleration to 9.5M million pandas users without requiring them to change their code.... Decorative image of a computer screen against a purple background, with a dial on the side.

Decorative image of a computer screen against a purple background, with a dial on the side.

At NVIDIA GTC 2024, it was announced that RAPIDS cuDF can now bring GPU acceleration to 9.5M million pandas users without requiring them to change their code. Update: RAPIDS cuDF now instantly accelerates pandas with zero code changes in Google Colab. Try out the tutorial in a Colab notebook today. pandas, a flexible and powerful data analysis and manipulation library for Python��

]]> 5 Joseph Lucas <![CDATA[Analyzing the Security of Machine Learning Research Code]]> http://www.open-lab.net/blog/?p=71113 2024-07-08T21:33:52Z 2023-10-04T18:00:00Z

The NVIDIA AI Red Team is focused on scaling secure development practices across the data, science, and AI ecosystems. We participate in open-source security...]]>

The NVIDIA AI Red Team is focused on scaling secure development practices across the data, science, and AI ecosystems. We participate in open-source security...

man-with-laptop

The NVIDIA AI Red Team is focused on scaling secure development practices across the data, science, and AI ecosystems. We participate in open-source security initiatives, release tools, present at industry conferences, host educational competitions, and provide innovative training. Covering 3 years and totaling almost 140GB of source code, the recently released Meta Kaggle for Code dataset is��

]]> 2 Michelle Horton <![CDATA[Workshop: Fundamentals of Deep Learning]]> http://www.open-lab.net/blog/?p=70476 2024-04-29T20:58:34Z 2023-09-08T16:00:00Z

Learn key techniques and tools required to train a deep learning model in this virtual hands-on workshop.]]>

Learn key techniques and tools required to train a deep learning model in this virtual hands-on workshop. Fundamentals of Deep Learning workshop promo.

Fundamentals of Deep Learning workshop promo.

Learn key techniques and tools required to train a deep learning model in this virtual hands-on workshop.

]]> 0 Jess Nguyen <![CDATA[ICYMI: Utilizing GPUs for Machine Learning with RAPIDS]]> http://www.open-lab.net/blog/?p=69941 2023-08-24T18:03:35Z 2023-08-23T18:45:15Z

Delve into how TMA Solutions is accelerating original ML and AI workflows with RAPIDS.]]>

Delve into how TMA Solutions is accelerating original ML and AI workflows with RAPIDS. An illustration representing the RAPIDS workflow.

An illustration representing the RAPIDS workflow.

Delve into how TMA Solutions is accelerating original ML and AI workflows with RAPIDS.

]]> 0 Jess Nguyen <![CDATA[ICYMI: Unlocking the Power of GPU-Accelerated DataFrames?in Python]]> http://www.open-lab.net/blog/?p=68916 2023-08-24T18:03:51Z 2023-08-04T16:00:00Z

Read this tutorial on how to tap into GPUs by importing cuDF instead of pandas�Cwith only a few code changes.]]>

Read this tutorial on how to tap into GPUs by importing cuDF instead of pandas�Cwith only a few code changes. An illustration with 3 different colored squares labeled GPUs in a row.

An illustration with 3 different colored squares labeled GPUs in a row.

Read this tutorial on how to tap into GPUs by importing cuDF instead of pandas�Cwith only a few code changes.

]]> 0 Jess Nguyen <![CDATA[New Video: Visualizing Census Data with RAPIDS cuDF and Plotly Dash]]> http://www.open-lab.net/blog/?p=68219 2023-12-12T23:51:32Z 2023-07-17T21:00:00Z

Gathering business insights can be a pain, especially when you're dealing with countless data points. It��s no secret that GPUs can be a time-saver for...]]>

Gathering business insights can be a pain, especially when you're dealing with countless data points. It��s no secret that GPUs can be a time-saver for... A US map showing different colors representing data visualization.

A US map showing different colors representing data visualization.

Gathering business insights can be a pain, especially when you��re dealing with countless data points. It��s no secret that GPUs can be a time-saver for data scientists. Rather than wait for a single query to run, GPUs help speed up the process and get you the insights you need quickly. In this video, Allan Enemark, RAPIDS data visualization lead, uses a US Census dataset with over 300��

]]> 1 Allan Enemark <![CDATA[Accelerated Data Analytics: A Guide to Data Visualization with RAPIDS]]> http://www.open-lab.net/blog/?p=67804 2023-12-12T23:47:02Z 2023-07-11T20:00:00Z

Visualization brings data to life, unveiling hidden patterns and insights through accessible visuals, and empowering you and your organization to perceive the...]]>

Visualization brings data to life, unveiling hidden patterns and insights through accessible visuals, and empowering you and your organization to perceive the...

cuxfilter-datashader-divvy

Visualization brings data to life, unveiling hidden patterns and insights through accessible visuals, and empowering you and your organization to perceive the invisible, make informed decisions, and fully leverage your data. Especially when working with large datasets, interaction can be difficult as render and compute times become prohibitive. Switching to RAPIDS libraries, such as cuDF��

]]> 0 Andrew Briand <![CDATA[Limit Order Book Dataset Generation for Accelerated Short-Term Price Prediction with RAPIDS]]> http://www.open-lab.net/blog/?p=64676 2023-06-27T16:06:02Z 2023-05-19T17:00:00Z

In the high-frequency trading world, thousands of market participants interact daily. In fact, high-frequency trading accounts for more than half of the US...]]>

In the high-frequency trading world, thousands of market participants interact daily. In fact, high-frequency trading accounts for more than half of the US... Stock board

Stock board

In the high-frequency trading world, thousands of market participants interact daily. In fact, high-frequency trading accounts for more than half of the US equity trading volume, according to the paper High-Frequency Trading Synchronizes Prices in Financial Markets. Market makers are the big players on the sell side who provide liquidity in the market. Speculators are on the buy side��

]]> 0 Prachi Goel <![CDATA[Accelerated Data Analytics: Speed Up Data Exploration with RAPIDS cuDF]]> http://www.open-lab.net/blog/?p=61837 2023-12-12T23:48:52Z 2023-03-14T14:01:00Z

This post is part of a series on accelerated data analytics. Digital advancements in climate modeling, healthcare, finance, and retail are generating...]]>

This post is part of a series on accelerated data analytics. Digital advancements in climate modeling, healthcare, finance, and retail are generating...

data-analysis-accelerated-featured

This post is part of a series on accelerated data analytics. Digital advancements in climate modeling, healthcare, finance, and retail are generating unprecedented volumes and types of data. IDC says that by 2025, there will be 180 ZB of data compared to 64 ZB in 2020, scaling up the need for data analytics to turn all that data into insights. NVIDIA provides the RAPIDS suite of��

]]> 0 Prachi Goel <![CDATA[Accelerated Data Analytics: Faster Time Series Analysis with RAPIDS cuDF]]> http://www.open-lab.net/blog/?p=61790 2025-05-07T22:43:39Z 2023-03-14T14:00:00Z

This post is part of a series on accelerated data analytics. [stextbox id="info"]Update: The below blog describes how to use GPU-only RAPIDS cuDF, which...]]>

This post is part of a series on accelerated data analytics. [stextbox id="info"]Update: The below blog describes how to use GPU-only RAPIDS cuDF, which... Abstract bar graph

Abstract bar graph

This post is part of a series on accelerated data analytics. Update: The below blog describes how to use GPU-only RAPIDS cuDF, which requires code changes. RAPIDS cuDF now has a CPU/GPU interoperability (cudf.pandas) that speeds up pandas code by up to 150x with zero code changes. At GTC 2024, NVIDIA announced that the cudf.pandas library is now GA. At Google I/O��

]]> 0 Jacob Tomlinson <![CDATA[Accelerating ETL on KubeFlow with RAPIDS]]> http://www.open-lab.net/blog/?p=54194 2023-11-10T01:32:59Z 2022-08-30T20:58:47Z

In the machine learning and MLOps world, GPUs are widely used to speed up model training and inference, but what about the other stages of the workflow like ETL...]]>

In the machine learning and MLOps world, GPUs are widely used to speed up model training and inference, but what about the other stages of the workflow like ETL...

accelerating-etl-featured

In the machine learning and MLOps world, GPUs are widely used to speed up model training and inference, but what about the other stages of the workflow like ETL pipelines or hyperparameter optimization? Within the RAPIDS data science framework, ETL tools are designed to have a familiar look and feel to data scientists working in Python. Do you currently use Pandas, NumPy, Scikit-learn��

]]> 1 Brandon Miller <![CDATA[Prototyping Faster with the Newest UDF Enhancements in the NVIDIA cuDF API]]> http://www.open-lab.net/blog/?p=47368 2024-07-12T21:35:48Z 2022-05-27T19:57:02Z

Over the past few releases, the NVIDIA cuDF team has added several new features to user-defined functions (UDFs) that can streamline the development process...]]>

Over the past few releases, the NVIDIA cuDF team has added several new features to user-defined functions (UDFs) that can streamline the development process...

cuDF UDF

Over the past few releases, the NVIDIA cuDF team has added several new features to user-defined functions (UDFs) that can streamline the development process while improving overall performance. In this post, I walk through the new UDF enhancements and show how you can take advantage of them within your own applications: If you��re not familiar with pandas, series apply is the main��

]]> 0 Tom Drabas <![CDATA[Python Pandas Tutorial: A Beginner��s Guide to GPU Accelerated DataFrames for Pandas Users]]> http://www.open-lab.net/blog/?p=24011 2024-05-15T16:09:08Z 2021-03-11T18:19:17Z

This series on the RAPIDS ecosystem explores the various aspects that enable you to solve extract, transform, load (ETL) problems, build machine learning (ML)...]]>

This series on the RAPIDS ecosystem explores the various aspects that enable you to solve extract, transform, load (ETL) problems, build machine learning (ML)...

pexels-erik-mclean-5199661

This series on the RAPIDS ecosystem explores the various aspects that enable you to solve extract, transform, load (ETL) problems, build machine learning (ML) and deep learning (DL) models, explore expansive graphs, process signal and system logs, or use the SQL language through BlazingSQL to process data. For part 1, see Pandas DataFrame Tutorial: A Beginner��s Guide to GPU Accelerated DataFrames��

]]> 0 Tom Drabas <![CDATA[Pandas DataFrame Tutorial �C Beginner��s Guide to GPU Accelerated DataFrames in Python]]> http://www.open-lab.net/blog/?p=23974 2024-05-15T16:07:38Z 2021-03-03T18:22:21Z

This post is the first installment of the series of introductions to the RAPIDS ecosystem. The series explores and discusses various aspects of RAPIDS that...]]>

This post is the first installment of the series of introductions to the RAPIDS ecosystem. The series explores and discusses various aspects of RAPIDS that...

BlazingSQL_Feature-image1

This post is the first installment of the series of introductions to the RAPIDS ecosystem. The series explores and discusses various aspects of RAPIDS that allow its users solve ETL (Extract, Transform, Load) problems, build ML (Machine Learning) and DL (Deep Learning) models, explore expansive graphs, process geospatial, signal, and system log data, or use SQL language via BlazingSQL to process��

]]> 1 ��˳��97caoporen��