DASK – NVIDIA Technical Blog News and tutorials for developers, data scientists, and IT admins 2025-07-03T22:20:47Z http://www.open-lab.net/blog/feed/ Ben Zaitlen https://www.linkedin.com/in/benjamin-zaitlen-62ab7b4/ <![CDATA[Best Practices for Multi-GPU Data Analysis Using RAPIDS with Dask]]> http://www.open-lab.net/blog/?p=92480 2024-12-12T19:38:40Z 2024-11-21T19:02:03Z As we move towards a more dense computing infrastructure, with more compute, more GPUs, accelerated networking, and so forth��multi-gpu training and analysis...]]> As we move towards a more dense computing infrastructure, with more compute, more GPUs, accelerated networking, and so forth��multi-gpu training and analysis...

As we move towards a more dense computing infrastructure, with more compute, more GPUs, accelerated networking, and so forth��multi-gpu training and analysis grows in popularity. We need tools and also best practices as developers and practitioners move from CPU to GPU clusters. RAPIDS is a suite of open-source GPU-accelerated data science and AI libraries. These libraries can easily scale-out for��

Source

]]>
0
Jacob Tomlinson <![CDATA[Accelerating ETL on KubeFlow with RAPIDS]]> http://www.open-lab.net/blog/?p=54194 2023-11-10T01:32:59Z 2022-08-30T20:58:47Z In the machine learning and MLOps world, GPUs are widely used to speed up model training and inference, but what about the other stages of the workflow like ETL...]]> In the machine learning and MLOps world, GPUs are widely used to speed up model training and inference, but what about the other stages of the workflow like ETL...

In the machine learning and MLOps world, GPUs are widely used to speed up model training and inference, but what about the other stages of the workflow like ETL pipelines or hyperparameter optimization? Within the RAPIDS data science framework, ETL tools are designed to have a familiar look and feel to data scientists working in Python. Do you currently use Pandas, NumPy, Scikit-learn��

Source

]]>
1
Rick Zamora <![CDATA[Optimizing Access to Parquet Data with fsspec]]> http://www.open-lab.net/blog/?p=46692 2023-06-12T20:50:59Z 2022-05-05T20:16:22Z As datasets continue to grow in size, the adoption of cloud-storage platforms like Amazon S3 and Google Cloud Storage (GCS) are becoming more popular. Although...]]> As datasets continue to grow in size, the adoption of cloud-storage platforms like Amazon S3 and Google Cloud Storage (GCS) are becoming more popular. Although...

As datasets continue to grow in size, the adoption of cloud-storage platforms like Amazon S3 and Google Cloud Storage (GCS) are becoming more popular. Although node-local storage is likely to result in better IO performance, this approach can become impractical after the dataset exceeds the single-terabyte scale. For cases where remote storage is the only practical solution��

Source

]]>
2
Yi Dong <![CDATA[Accelerated Portfolio Construction with Numba and Dask in Python]]> http://www.open-lab.net/blog/?p=38831 2023-03-14T18:32:08Z 2021-10-21T18:30:00Z Python is no stranger to data scientists. It ranks as the most popular computer language and is widely used for all kinds of tasks. Though Python is notoriously...]]> Python is no stranger to data scientists. It ranks as the most popular computer language and is widely used for all kinds of tasks. Though Python is notoriously...

Python is no stranger to data scientists. It ranks as the most popular computer language and is widely used for all kinds of tasks. Though Python is notoriously slow when the code is interpreted at runtime, many popular libraries make it run efficiently on GPUs for certain data science work. For example, popular deep learning frameworks such as TensorFlow, and PyTorch help AI researchers to��

Source

]]>
0
Jacob Schmitt <![CDATA[Zero to RAPIDS in Minutes with NVIDIA GPUs + Saturn Cloud]]> http://www.open-lab.net/blog/?p=36243 2022-08-21T23:52:30Z 2021-08-31T15:00:00Z GPU-accelerated computing is a game-changer for data practitioners and enterprises, but leveraging GPUs can be challenging for data professionals. RAPIDS...]]> GPU-accelerated computing is a game-changer for data practitioners and enterprises, but leveraging GPUs can be challenging for data professionals. RAPIDS...

GPU-accelerated computing is a game-changer for data practitioners and enterprises, but leveraging GPUs can be challenging for data professionals. RAPIDS remediates these challenges by abstracting the complexities of accelerated data science through familiar interfaces. When using RAPIDS, practitioners can quickly accelerate data science workloads on NVIDIA GPUs, reducing operations like data��

Source

]]>
0
Belen Tegegn <![CDATA[Accelerating XGBoost on GPU Clusters with Dask]]> http://www.open-lab.net/blog/?p=33020 2022-08-21T23:51:58Z 2021-06-17T15:00:00Z In XGBoost 1.0, we introduced a new official Dask interface to support efficient distributed training.  Fast-forwarding to XGBoost 1.4, the interface is...]]> In XGBoost 1.0, we introduced a new official Dask interface to support efficient distributed training.  Fast-forwarding to XGBoost 1.4, the interface is...

In XGBoost 1.0, we introduced a new official Dask interface to support efficient distributed training. Fast-forwarding to XGBoost 1.4, the interface is now feature-complete. If you are new to the XGBoost Dask interface, look at the first post for a gentle introduction. In this post, we look at simple code examples, showing how to maximize the benefits of GPU acceleration.

Source

]]>
0
Tom Drabas <![CDATA[Dask Tutorial �C Beginner��s Guide to Distributed Computing with GPUs in Python]]> http://www.open-lab.net/blog/?p=24732 2022-08-21T23:41:08Z 2021-03-18T23:45:22Z This is the third installment of the series of introductions to the RAPIDS ecosystem. The series explores and discusses various aspects of RAPIDS that allow its...]]> This is the third installment of the series of introductions to the RAPIDS ecosystem. The series explores and discusses various aspects of RAPIDS that allow its...

This is the third installment of the series of introductions to the RAPIDS ecosystem. The series explores and discusses various aspects of RAPIDS that allow its users solve ETL (Extract, Transform, Load) problems, build ML (Machine Learning) and DL (Deep Learning) models, explore expansive graphs, process geospatial, signal, and system log data, or use SQL language via BlazingSQL to process data.

Source

]]>
0
Jacob Schmitt <![CDATA[Making Python Data Science Enterprise-Ready with Dask]]> http://www.open-lab.net/blog/?p=21187 2023-03-22T01:09:02Z 2020-10-05T19:05:00Z At NVIDIA, we are driving change in data science, machine learning, and artificial intelligence. Some of the key trends that drive us are as follows: The rise...]]> At NVIDIA, we are driving change in data science, machine learning, and artificial intelligence. Some of the key trends that drive us are as follows: The rise...

At NVIDIA, we are driving change in data science, machine learning, and artificial intelligence. Some of the key trends that drive us are as follows: At the intersection of these trends is Dask, an open-source library designed to provide parallelism to the existing Python stack. In this post, we talk about Dask, what it is, how we use it at NVIDIA, and why it has so much potential��

Source

]]>
0
Yi Dong <![CDATA[Accelerating Python for Exotic Option Pricing]]> http://www.open-lab.net/blog/?p=16723 2022-08-21T23:39:50Z 2020-03-19T22:49:44Z In finance, computation efficiency can be directly converted to trading profits sometimes. Quants are facing the challenges of trading off research efficiency...]]> In finance, computation efficiency can be directly converted to trading profits sometimes. Quants are facing the challenges of trading off research efficiency...

Source

]]>
0
���˳���97caoporen����