Scikit-learn, the most widely used ML library, is popular for processing tabular data because of its simple API, diversity of algorithms, and compatibility with popular Python libraries such as pandas and NumPy. NVIDIA cuML now enables you to continue using familiar scikit-learn APIs and Python libraries while enabling data scientists and machine learning engineers to harness the power of CUDA on…
]]>RAPIDS 24.12 introduces cuDF packages to PyPI, speeds up aggregations and reading files from AWS S3, enables larger-than-GPU memory queries in the Polars GPU engine, and faster graph neural network (GNN) training on real-world graphs. Starting with the 24.12 release of RAPIDS, CUDA 12 builds of , , , and all of their dependencies are now available on PyPI. As a result…
]]>Modern classification workflows often require classifying individual records and data points into multiple categories instead of just assigning a single label. Open-source Python libraries like scikit-learn make it easier to build models for these multi-label problems. Several models have built-in support for multi-label datasets, and a simple scikit-learn utility function enables using those…
]]>As consumer applications generate more data than ever before, enterprises are turning to causal inference methods for observational data to help shed light on how changes to individual components of their app impact key business metrics. Over the last decade, econometricians have developed a technique called double machine learning that brings the power of machine learning models to causal…
]]>The RAPIDS v24.10 release takes another step forward in bringing accelerated computing to data scientists and developers with a seamless user experience. This blog post highlights the new features including: NetworkX accelerated by RAPIDS cuGraph is now GA in the 24.10 release beginning with NetworkX 3.4. This release adds GPU-accelerated graph creation, a new user experience…
]]>UMAP is a popular dimension reduction algorithm used in fields like bioinformatics, NLP topic modeling, and ML preprocessing. It works by creating a k-nearest neighbors (k-NN) graph, which is known in literature as an all-neighbors graph, to build a fuzzy topological representation of the data, which is used to embed high-dimensional data into lower dimensions. RAPIDS cuML already contained…
]]>Polars, one of the fastest-growing data analytics tools, has just crossed 9M monthly downloads. As a modern DataFrame library, it is designed for efficiently processing datasets that fit on a single machine, without the overhead and complexity of distributed computing systems that are required for massive-scale workloads. As enterprises grapple with complex data problems—ranging from…
]]>At Google I/O’24, Laurence Moroney, head of AI Advocacy at Google, announced that RAPIDS cuDF is now integrated into Google Colab. Developers can now instantly accelerate pandas code up to 50x on Google Colab GPU instances, and continue using pandas as data grows—without sacrificing performance. RAPIDS cuDF is a GPU DataFrame library that accelerates the data processing tool pandas with zero…
]]>At NVIDIA GTC 2024, it was announced that RAPIDS cuDF can now bring GPU acceleration to 9.5M million pandas users without requiring them to change their code. Update: RAPIDS cuDF now instantly accelerates pandas with zero code changes in Google Colab. Try out the tutorial in a Colab notebook today. pandas, a flexible and powerful data analysis and manipulation library for Python…
]]>NVIDIA and Snowflake announced a new partnership bringing accelerated computing to the Data Cloud with the new Snowpark Container Services (private preview), a runtime for developers to manage and deploy containerized workloads. By integrating the capabilities of GPUs and AI into the Snowflake platform, customers can enhance ML performance and efficiently fine-tune LLMs. They achieve this by…
]]>HDBSCAN is a state-of-the-art, density-based clustering algorithm that has become popular in domains as varied as topic modeling, genomics, and geospatial analytics. RAPIDS cuML has provided accelerated HDBSCAN since the 21.10 release in October 2021, as detailed in GPU-Accelerated Hierarchical DBSCAN with RAPIDS cuML – Let’s Get Back To The Future. However, support for soft clustering (also…
]]>To achieve state-of-the-art machine learning (ML) solutions, data scientists often build complex ML models. However, these techniques are computationally expensive, and until recently required extensive background knowledge, experience, and human effort. Recently, at GTC 21, AWS Senior Data Scientist Nick Erickson gave a session sharing how the combination of AutoGluon, RAPIDS…
]]>This post was originally published on the RAPIDS AI blog. RAPIDS is about creating bridges, connections, and clean handoffs between GPU PyData libraries. Interoperability with functionality is our goal. For example, if you’re working with RAPIDS cuDF but need a more linear-algebra oriented function that exists in CuPy, you can leverage the interoperability of the GPU PyData ecosystem to…
]]>