As we move towards a more dense computing infrastructure, with more compute, more GPUs, accelerated networking, and so forth��multi-gpu training and analysis grows in popularity. We need tools and also best practices as developers and practitioners move from CPU to GPU clusters. RAPIDS is a suite of open-source GPU-accelerated data science and AI libraries. These libraries can easily scale-out for��
]]>In the machine learning and MLOps world, GPUs are widely used to speed up model training and inference, but what about the other stages of the workflow like ETL pipelines or hyperparameter optimization? Within the RAPIDS data science framework, ETL tools are designed to have a familiar look and feel to data scientists working in Python. Do you currently use Pandas, NumPy, Scikit-learn��
]]>As datasets continue to grow in size, the adoption of cloud-storage platforms like Amazon S3 and Google Cloud Storage (GCS) are becoming more popular. Although node-local storage is likely to result in better IO performance, this approach can become impractical after the dataset exceeds the single-terabyte scale. For cases where remote storage is the only practical solution��
]]>Python is no stranger to data scientists. It ranks as the most popular computer language and is widely used for all kinds of tasks. Though Python is notoriously slow when the code is interpreted at runtime, many popular libraries make it run efficiently on GPUs for certain data science work. For example, popular deep learning frameworks such as TensorFlow, and PyTorch help AI researchers to��
]]>GPU-accelerated computing is a game-changer for data practitioners and enterprises, but leveraging GPUs can be challenging for data professionals. RAPIDS remediates these challenges by abstracting the complexities of accelerated data science through familiar interfaces. When using RAPIDS, practitioners can quickly accelerate data science workloads on NVIDIA GPUs, reducing operations like data��
]]>In XGBoost 1.0, we introduced a new official Dask interface to support efficient distributed training. Fast-forwarding to XGBoost 1.4, the interface is now feature-complete. If you are new to the XGBoost Dask interface, look at the first post for a gentle introduction. In this post, we look at simple code examples, showing how to maximize the benefits of GPU acceleration.
]]>This is the third installment of the series of introductions to the RAPIDS ecosystem. The series explores and discusses various aspects of RAPIDS that allow its users solve ETL (Extract, Transform, Load) problems, build ML (Machine Learning) and DL (Deep Learning) models, explore expansive graphs, process geospatial, signal, and system log data, or use SQL language via BlazingSQL to process data.
]]>At NVIDIA, we are driving change in data science, machine learning, and artificial intelligence. Some of the key trends that drive us are as follows: At the intersection of these trends is Dask, an open-source library designed to provide parallelism to the existing Python stack. In this post, we talk about Dask, what it is, how we use it at NVIDIA, and why it has so much potential��
]]>