Data Analytics / Processing

May 19, 2025
Spotlight: Atgenomix SeqsLab Scales Health Omics Analysis for Precision Medicine
In traditional clinical medical practice, treatment decisions are often based on general guidelines, past experiences, and trial-and-error approaches. Today,...
9 MIN READ

May 15, 2025
Predicting Performance on Apache Spark with GPUs
The world of big data analytics is constantly seeking ways to accelerate processing and reduce infrastructure costs. Apache Spark has become a leading platform...
9 MIN READ

May 01, 2025
Stacking Generalization with HPO: Maximize Accuracy in 15 Minutes with NVIDIA cuML
Stacking generalization is a widely used technique among machine learning (ML) engineers, where multiple models are combined to boost overall predictive...
7 MIN READ

Apr 17, 2025
Grandmaster Pro Tip: Winning First Place in Kaggle Competition with Feature Engineering using NVIDIA cuDF-pandas
Feature engineering remains one of the most effective ways to improve model accuracy when working with tabular data. Unlike domains such as NLP and computer...
5 MIN READ

Apr 03, 2025
Accelerating Apache Parquet Scans on Apache Spark with GPUs
As data sizes have grown in enterprises across industries, Apache Parquet has become a prominent format for storing data. Apache Parquet is a columnar storage...
8 MIN READ

Jan 29, 2025
Accelerating JSON Processing on Apache Spark with GPUs
JSON is a popular format for text-based data that allows for interoperability between systems in web applications as well as data management. The format has...
9 MIN READ

Jan 16, 2025
Accelerating Time Series Forecasting with RAPIDS cuML
Time series forecasting is a powerful data science technique used to predict future values based on data points from the past Open source Python libraries like...
4 MIN READ

Jan 13, 2025
Enhancing Generative AI Model Accuracy with NVIDIA NeMo Curator
In the rapidly evolving landscape of artificial intelligence, the quality of the data used for training models is paramount. High-quality data ensures that...
5 MIN READ

Dec 12, 2024
Harnessing GPU Acceleration for Multi-Label Classification with RAPIDS cuML
Modern classification workflows often require classifying individual records and data points into multiple categories instead of just assigning a single label....
4 MIN READ

Dec 05, 2024
Unified Virtual Memory Supercharges pandas with RAPIDS cuDF
cuDF-pandas, introduced in a previous post, is a GPU-accelerated library that accelerates pandas to deliver significant performance improvements—up to 50x...
5 MIN READ

Nov 14, 2024
Faster Causal Inference on Large Datasets with NVIDIA RAPIDS
As consumer applications generate more data than ever before, enterprises are turning to causal inference methods for observational data to help shed light on...
4 MIN READ

Oct 15, 2024
Train Highly Accurate LLMs with the Zyda-2 Open 5T-Token Dataset Processed with NVIDIA NeMo Curator
Open-source datasets have significantly democratized access to high-quality data, lowering the barriers of entry for developers and researchers to train...
5 MIN READ

Oct 04, 2024
Just Released: NVIDIA NeMo Curator Improvements for Accelerating Data Curation
NeMo Curator now supports images, enabling you to process data for training accurate generative AI models.
1 MIN READ

Sep 17, 2024
Polars GPU Engine Powered by RAPIDS cuDF Now Available in Open Beta
Today, Polars released a new GPU engine powered by RAPIDS cuDF that accelerates Polars workflows up to 13x on NVIDIA GPUs, allowing data scientists to process...
4 MIN READ

Aug 30, 2024
Accelerating Predictive Maintenance in Manufacturing with RAPIDS AI
The International Society of Automation (ISA) reports that 5% of plant production is lost annually due to downtime. Putting that into a different context,...
12 MIN READ

Aug 29, 2024
Just Released: RAPIDS 24.08
RAPIDS 24.08 is now available with significant updates geared towards processing larger workloads and seamless CPU/GPU interoperability.
1 MIN READ