In data science, operational efficiency is key to handling increasingly complex and large datasets. GPU acceleration has become essential for modern workflows, offering significant performance improvements. RAPIDS is a suite of open-source libraries and frameworks developed by NVIDIA, designed to accelerate data science pipelines using GPUs with minimal code changes.
]]>The One Billion Row Challenge is a fun benchmark to showcase basic data processing operations. It was originally launched as a pure-Java competition, and has gathered a community of developers in other languages, including Python, Rust, Go, Swift, and more. The challenge has been useful for many software engineers with an interest in exploring the details of text file reading��
]]>RAPIDS 24.08 is now available with significant updates geared towards processing larger workloads and seamless CPU/GPU interoperability.
]]>Recommender systems play a crucial role in personalizing user experiences across various platforms. These systems are designed to predict and suggest items that users are likely to interact with, based on their past behavior and preferences. Building an effective recommender system involves understanding and leveraging huge, complex datasets that capture interactions between users and items.
]]>NVIDIA has released RAPIDS cuDF unified memory and text data processing features that help data scientists continue to use pandas when working with larger and text-heavy datasets in demanding workloads. Data scientists can now accelerate these workloads by up to 30x. RAPIDS is a collection of open-source GPU-accelerated data science and AI libraries. cuDF is a Python GPU DataFrame library for��
]]>In today��s data-driven landscape, maximizing performance and efficiency in data processing and analytics is critical. While many Databricks users are familiar with using GPU clusters for machine learning training, there��s a vast opportunity to leverage GPU acceleration for data processing and analytics tasks as well. Databricks�� Data Intelligence Platform empowers users to manage both small��
]]>At Google I/O��24, Laurence Moroney, head of AI Advocacy at Google, announced that RAPIDS cuDF is now integrated into Google Colab. Developers can now instantly accelerate pandas code up to 50x on Google Colab GPU instances, and continue using pandas as data grows��without sacrificing performance. RAPIDS cuDF is a GPU DataFrame library that accelerates the data processing tool pandas with zero��
]]>At GTC 2024, experts from NVIDIA and our partners shared insights about GPU-accelerated tools, optimizations, and best practices for data scientists. From the hundreds of sessions covering various topics, we��ve handpicked the top three data science sessions that you won��t want to miss. RAPIDS in 2024: Accelerated Data Science Everywhere Speakers: Dante Gama Dessavre��
]]>At NVIDIA GTC 2024, it was announced that RAPIDS cuDF can now bring GPU acceleration to 9.5M million pandas users without requiring them to change their code. Update: RAPIDS cuDF now instantly accelerates pandas with zero code changes in Google Colab. Try out the tutorial in a Colab notebook today. pandas, a flexible and powerful data analysis and manipulation library for Python��
]]>The NVIDIA AI Red Team is focused on scaling secure development practices across the data, science, and AI ecosystems. We participate in open-source security initiatives, release tools, present at industry conferences, host educational competitions, and provide innovative training. Covering 3 years and totaling almost 140GB of source code, the recently released Meta Kaggle for Code dataset is��
]]>Learn key techniques and tools required to train a deep learning model in this virtual hands-on workshop.
]]>Delve into how TMA Solutions is accelerating original ML and AI workflows with RAPIDS.
]]>Read this tutorial on how to tap into GPUs by importing cuDF instead of pandas�Cwith only a few code changes.
]]>Gathering business insights can be a pain, especially when you��re dealing with countless data points. It��s no secret that GPUs can be a time-saver for data scientists. Rather than wait for a single query to run, GPUs help speed up the process and get you the insights you need quickly. In this video, Allan Enemark, RAPIDS data visualization lead, uses a US Census dataset with over 300��
]]>Visualization brings data to life, unveiling hidden patterns and insights through accessible visuals, and empowering you and your organization to perceive the invisible, make informed decisions, and fully leverage your data. Especially when working with large datasets, interaction can be difficult as render and compute times become prohibitive. Switching to RAPIDS libraries, such as cuDF��
]]>In the high-frequency trading world, thousands of market participants interact daily. In fact, high-frequency trading accounts for more than half of the US equity trading volume, according to the paper High-Frequency Trading Synchronizes Prices in Financial Markets. Market makers are the big players on the sell side who provide liquidity in the market. Speculators are on the buy side��
]]>This post is part of a series on accelerated data analytics. Digital advancements in climate modeling, healthcare, finance, and retail are generating unprecedented volumes and types of data. IDC says that by 2025, there will be 180 ZB of data compared to 64 ZB in 2020, scaling up the need for data analytics to turn all that data into insights. NVIDIA provides the RAPIDS suite of��
]]>This post is part of a series on accelerated data analytics. Update: The below blog describes how to use GPU-only RAPIDS cuDF, which requires code changes. RAPIDS cuDF now has a CPU/GPU interoperability (cudf.pandas) that speeds up pandas code by up to 150x with zero code changes. At GTC 2024, NVIDIA announced that the cudf.pandas library is now GA. At Google I/O��
]]>In the machine learning and MLOps world, GPUs are widely used to speed up model training and inference, but what about the other stages of the workflow like ETL pipelines or hyperparameter optimization? Within the RAPIDS data science framework, ETL tools are designed to have a familiar look and feel to data scientists working in Python. Do you currently use Pandas, NumPy, Scikit-learn��
]]>Over the past few releases, the NVIDIA cuDF team has added several new features to user-defined functions (UDFs) that can streamline the development process while improving overall performance. In this post, I walk through the new UDF enhancements and show how you can take advantage of them within your own applications: If you��re not familiar with pandas, series apply is the main��
]]>This series on the RAPIDS ecosystem explores the various aspects that enable you to solve extract, transform, load (ETL) problems, build machine learning (ML) and deep learning (DL) models, explore expansive graphs, process signal and system logs, or use the SQL language through BlazingSQL to process data. For part 1, see Pandas DataFrame Tutorial: A Beginner��s Guide to GPU Accelerated DataFrames��
]]>This post is the first installment of the series of introductions to the RAPIDS ecosystem. The series explores and discusses various aspects of RAPIDS that allow its users solve ETL (Extract, Transform, Load) problems, build ML (Machine Learning) and DL (Deep Learning) models, explore expansive graphs, process geospatial, signal, and system log data, or use SQL language via BlazingSQL to process��
]]>