Data Analytics / Processing

May 19, 2025

Spotlight: Atgenomix SeqsLab Scales Health Omics Analysis for Precision Medicine

In traditional clinical medical practice, treatment decisions are often based on general guidelines, past experiences, and trial-and-error approaches. Today,...

9 MIN READ

May 15, 2025

Predicting Performance on Apache Spark with GPUs

The world of big data analytics is constantly seeking ways to accelerate processing and reduce infrastructure costs. Apache Spark has become a leading platform...

9 MIN READ

May 01, 2025

Stacking Generalization with HPO: Maximize Accuracy in 15 Minutes with NVIDIA cuML

Stacking generalization is a widely used technique among machine learning (ML) engineers, where multiple models are combined to boost overall predictive...

7 MIN READ

Apr 17, 2025

Grandmaster Pro Tip: Winning First Place in Kaggle Competition with Feature Engineering using NVIDIA cuDF-pandas

Feature engineering remains one of the most effective ways to improve model accuracy when working with tabular data. Unlike domains such as NLP and computer...

5 MIN READ

Apr 03, 2025

Accelerating Apache Parquet Scans on Apache Spark with GPUs

As data sizes have grown in enterprises across industries, Apache Parquet has become a prominent format for storing data. Apache Parquet is a columnar storage...

8 MIN READ

A diagram of how JSON data is processed.

Jan 29, 2025

Accelerating JSON Processing on Apache Spark with GPUs

JSON is a popular format for text-based data that allows for interoperability between systems in web applications as well as data management. The format has...

9 MIN READ

Jan 16, 2025

Accelerating Time Series Forecasting with RAPIDS cuML

Time series forecasting is a powerful data science technique used to predict future values based on data points from the past Open source Python libraries like...

4 MIN READ

NVIDIA NeMo Curator icon on a purple background.

Jan 13, 2025

Enhancing Generative AI Model Accuracy with NVIDIA NeMo Curator

In the rapidly evolving landscape of artificial intelligence, the quality of the data used for training models is paramount. High-quality data ensures that...

5 MIN READ

Dec 12, 2024

Harnessing GPU Acceleration for Multi-Label Classification with RAPIDS cuML

Modern classification workflows often require classifying individual records and data points into multiple categories instead of just assigning a single label....

4 MIN READ

Dec 05, 2024

Unified Virtual Memory Supercharges pandas with RAPIDS cuDF

cuDF-pandas, introduced in a previous post, is a GPU-accelerated library that accelerates pandas to deliver significant performance improvements—up to 50x...

5 MIN READ

Nov 14, 2024

Faster Causal Inference on Large Datasets with NVIDIA RAPIDS

As consumer applications generate more data than ever before, enterprises are turning to causal inference methods for observational data to help shed light on...

4 MIN READ

Oct 15, 2024

Train Highly Accurate LLMs with the Zyda-2 Open 5T-Token Dataset Processed with NVIDIA NeMo Curator

Open-source datasets have significantly democratized access to high-quality data, lowering the barriers of entry for developers and researchers to train...

5 MIN READ

Oct 04, 2024

Just Released: NVIDIA NeMo Curator Improvements for Accelerating Data Curation

NeMo Curator now supports images, enabling you to process data for training accurate generative AI models.

1 MIN READ

Sep 17, 2024

Polars GPU Engine Powered by RAPIDS cuDF Now Available in Open Beta

Today, Polars released a new GPU engine powered by RAPIDS cuDF that accelerates Polars workflows up to 13x on NVIDIA GPUs, allowing data scientists to process...

4 MIN READ

Aug 30, 2024

Accelerating Predictive Maintenance in Manufacturing with RAPIDS AI

The International Society of Automation (ISA) reports that 5% of plant production is lost annually due to downtime. Putting that into a different context,...

12 MIN READ

Aug 29, 2024

Just Released: RAPIDS 24.08

RAPIDS 24.08 is now available with significant updates geared towards processing larger workloads and seamless CPU/GPU interoperability.

1 MIN READ