Posts by Gregory Kimball
Data Center / Cloud
Mar 11, 2025
Efficient ETL with Polars and Apache Spark on NVIDIA Grace CPU
The NVIDIA Grace CPU Superchip delivers outstanding performance and best-in-class energy efficiency for CPU workloads in the data center and in the cloud. The...
7 MIN READ
Data Science
Feb 20, 2025
JSON Lines Reading with pandas 100x Faster Using NVIDIA cuDF
JSON is a widely adopted format for text-based information working interoperably between systems, most commonly in web applications and large language models...
10 MIN READ
Data Science
Nov 28, 2024
Supercharging Deduplication in pandas Using RAPIDS cuDF
A common operation in data analytics is to drop duplicate rows. Deduplication is critical in Extract, Transform, Load (ETL) workflows, where you might want to...
12 MIN READ
Data Science
Sep 11, 2024
Scaling Up to One Billion Rows of Data in pandas using RAPIDS cuDF
The One Billion Row Challenge is a fun benchmark to showcase basic data processing operations. It was originally launched as a pure-Java competition, and has...
11 MIN READ
Data Science
Jul 17, 2024
Encoding and Compression Guide for Parquet String Data Using RAPIDS
Parquet writers provide encoding and compression options that are turned off by default. Enabling these options may provide better lossless compression for your...
10 MIN READ
Data Science
Dec 15, 2023
Streamline ETL Workflows with Nested Data Types in RAPIDS libcudf
Nested data types are a convenient way to represent hierarchical relationships within columnar data. They are frequently used as part of extract, transform,...
10 MIN READ