Apache Spark – NVIDIA Technical Blog

Apache Spark – NVIDIA Technical Blog News and tutorials for developers, data scientists, and IT admins 2025-06-06T20:19:33Z http://www.open-lab.net/blog/feed/ Yu-Ting Lin <![CDATA[Spotlight: Atgenomix SeqsLab Scales Health Omics Analysis for Precision Medicine]]> http://www.open-lab.net/blog/?p=100183 2025-05-29T17:30:54Z 2025-05-19T20:00:00Z

In traditional clinical medical practice, treatment decisions are often based on general guidelines, past experiences, and trial-and-error approaches. Today,...]]>

In traditional clinical medical practice, treatment decisions are often based on general guidelines, past experiences, and trial-and-error approaches. Today,...

In traditional clinical medical practice, treatment decisions are often based on general guidelines, past experiences, and trial-and-error approaches. Today, with access to electronic medical records (EMRs) and genomic data, a new era of precision medicine is emerging��one where treatments are tailored to individual patients with unprecedented accuracy. Precision medicine is an innovative approach��

]]> 0 Matt Ahrens <![CDATA[Predicting Performance on Apache Spark with GPUs]]> http://www.open-lab.net/blog/?p=100118 2025-05-29T19:04:59Z 2025-05-15T17:00:00Z

The world of big data analytics is constantly seeking ways to accelerate processing and reduce infrastructure costs. Apache Spark has become a leading platform...]]>

The world of big data analytics is constantly seeking ways to accelerate processing and reduce infrastructure costs. Apache Spark has become a leading platform...

Predicting Performance on Apache Spark with GPUs

The world of big data analytics is constantly seeking ways to accelerate processing and reduce infrastructure costs. Apache Spark has become a leading platform for scale-out analytics, handling massive datasets for ETL, machine learning, and deep learning workloads. While traditionally CPU-based, the advent of GPU acceleration offers a compelling promise: significant speedups for data processing��

]]> 0 Rishi Chandra <![CDATA[Accelerate Deep Learning and LLM Inference with Apache Spark in the Cloud]]> http://www.open-lab.net/blog/?p=99585 2025-05-29T19:05:15Z 2025-05-08T16:00:00Z

Apache Spark is an industry-leading platform for big data processing and analytics. With the increasing prevalence of unstructured data��documents, emails,...]]>

Apache Spark is an industry-leading platform for big data processing and analytics. With the increasing prevalence of unstructured data��documents, emails,...

Accelerate Deep Learning and LLM Inference with Apache Spark in the Cloud

Apache Spark is an industry-leading platform for big data processing and analytics. With the increasing prevalence of unstructured data��documents, emails, multimedia content��deep learning (DL) and large language models (LLMs) have become core components of the modern data analytics pipeline. These models enable a variety of downstream tasks, such as image captioning, semantic tagging��

]]> 0 Matt Ahrens <![CDATA[Accelerating Apache Parquet Scans on Apache Spark with GPUs]]> http://www.open-lab.net/blog/?p=98350 2025-04-22T23:57:50Z 2025-04-03T16:18:03Z

As data sizes have grown in enterprises across industries, Apache Parquet has become a prominent format for storing data. Apache Parquet is a columnar storage...]]>

As data sizes have grown in enterprises across industries, Apache Parquet has become a prominent format for storing data. Apache Parquet is a columnar storage... Decorative image.

Decorative image.

As data sizes have grown in enterprises across industries, Apache Parquet has become a prominent format for storing data. Apache Parquet is a columnar storage format designed for efficient data processing at scale. By organizing data by columns rather than rows, Parquet enables high-performance querying and analysis, as it can read only the necessary columns for a query instead of scanning entire��

]]> 3 Gregory Kimball <![CDATA[Efficient ETL with Polars and Apache Spark on NVIDIA Grace CPU]]> http://www.open-lab.net/blog/?p=96807 2025-04-23T00:33:58Z 2025-03-11T18:30:00Z

The NVIDIA Grace CPU Superchip delivers outstanding performance and best-in-class energy efficiency for CPU workloads in the data center and in the cloud. The...]]>

The NVIDIA Grace CPU Superchip delivers outstanding performance and best-in-class energy efficiency for CPU workloads in the data center and in the cloud. The...

sphere-concentric-circles

The NVIDIA Grace CPU Superchip delivers outstanding performance and best-in-class energy efficiency for CPU workloads in the data center and in the cloud. The benefits of NVIDIA Grace include high-performance Arm Neoverse V2 cores, fast NVIDIA-designed Scalable Coherency Fabric, and low-power high-bandwidth LPDDR5X memory. These features make the Grace CPU ideal for data processing with��

]]> 0 Erik Ordentlich <![CDATA[Accelerate Apache Spark ML on NVIDIA GPUs with Zero Code Change]]> http://www.open-lab.net/blog/?p=96768 2025-04-23T00:36:38Z 2025-03-06T19:49:16Z

The NVIDIA RAPIDS Accelerator for Apache Spark software plug-in pioneered a zero code change user experience (UX) for GPU-accelerated data processing. It...]]>

The NVIDIA RAPIDS Accelerator for Apache Spark software plug-in pioneered a zero code change user experience (UX) for GPU-accelerated data processing. It... Decorative image of dark blue background with points of light connected with lines.

Decorative image of dark blue background with points of light connected with lines.

The NVIDIA RAPIDS Accelerator for Apache Spark software plug-in pioneered a zero code change user experience (UX) for GPU-accelerated data processing. It accelerates existing Apache Spark SQL and DataFrame-based applications on NVIDIA GPUs by over 9x without requiring a change to your queries or source code. This led to the new Spark RAPIDS ML Python library, which can speed up��

]]> 0 Matt Ahrens <![CDATA[Accelerating JSON Processing on Apache Spark with GPUs]]> http://www.open-lab.net/blog/?p=95298 2025-04-23T15:01:08Z 2025-01-29T22:10:22Z

JSON is a popular format for text-based data that allows for interoperability between systems in web applications as well as data management. The format has...]]>

JSON is a popular format for text-based data that allows for interoperability between systems in web applications as well as data management. The format has... A diagram of how JSON data is processed.

A diagram of how JSON data is processed.

JSON is a popular format for text-based data that allows for interoperability between systems in web applications as well as data management. The format has been in existence since the early 2000s and came from the need for communication between web servers and browsers. The standard JSON format consists of key-value pairs that can include nested objects. JSON has grown in usage for storing web��

]]> 0 Amr Elmeleegy <![CDATA[NVIDIA GH200 Superchip Delivers Breakthrough Energy Efficiency and Node Consolidation for Apache Spark]]> http://www.open-lab.net/blog/?p=87567 2024-08-22T18:24:50Z 2024-08-20T20:00:00Z

With the rapid growth of generative AI, CIOs and IT leaders are looking for ways to reclaim data center resources to accommodate new AI use cases that promise...]]>

With the rapid growth of generative AI, CIOs and IT leaders are looking for ways to reclaim data center resources to accommodate new AI use cases that promise...

nvidia-hopper-grace

With the rapid growth of generative AI, CIOs and IT leaders are looking for ways to reclaim data center resources to accommodate new AI use cases that promise greater return on investment without impacting current operations. This is leading IT decision makers to reassess past infrastructure decisions and explore strategies to consolidate traditional workloads into fewer��

]]> 0 Rachel Ho <![CDATA[Level Up Your Skills with Five New NVIDIA Technical Courses]]> http://www.open-lab.net/blog/?p=83968 2024-06-27T18:18:02Z 2024-06-14T18:00:00Z

With AI introducing an unprecedented pace of technological innovation, staying ahead means keeping your skills up to date. The NVIDIA Developer Program gives...]]>

With AI introducing an unprecedented pace of technological innovation, staying ahead means keeping your skills up to date. The NVIDIA Developer Program gives...

group-working-with-laptop

With AI introducing an unprecedented pace of technological innovation, staying ahead means keeping your skills up to date. The NVIDIA Developer Program gives you the tools, training, and resources you need to succeed with the latest advancements across industries. We��re excited to announce the following five new technical courses from NVIDIA. Join the Developer Program now to get hands-on��

]]> 0 Bruno Trentini <![CDATA[Accelerating Neurosymbolic AI with RAPIDS and Prometheux Vadalog Parallel]]> http://www.open-lab.net/blog/?p=71512 2023-12-05T18:56:29Z 2023-11-09T20:17:05Z

As the scale of available data continues to grow, so does the need for scalable and intelligent data processing systems to swiftly harness useful knowledge....]]>

As the scale of available data continues to grow, so does the need for scalable and intelligent data processing systems to swiftly harness useful knowledge.... Artistic rendition of a symbolic graph.

Artistic rendition of a symbolic graph.

As the scale of available data continues to grow, so does the need for scalable and intelligent data processing systems to swiftly harness useful knowledge. Especially in high-stakes domains such as life sciences and finance, alongside scalability, transparency of data-driven processes becomes paramount to ensure the utmost trustworthiness. Started by scientists coming from the Knowledge��

]]> 0 Erik Ordentlich <![CDATA[Reduce Apache Spark ML Compute Costs with New Algorithms in Spark RAPIDS ML Library]]> http://www.open-lab.net/blog/?p=72126 2023-11-02T18:14:32Z 2023-10-24T19:00:00Z

Spark RAPIDS ML is an open-source Python package enabling NVIDIA GPU acceleration of PySpark MLlib. It offers PySpark MLlib DataFrame API compatibility and...]]>

Spark RAPIDS ML is an open-source Python package enabling NVIDIA GPU acceleration of PySpark MLlib. It offers PySpark MLlib DataFrame API compatibility and...

graphs-abstract

Spark RAPIDS ML is an open-source Python package enabling NVIDIA GPU acceleration of PySpark MLlib. It offers PySpark MLlib DataFrame API compatibility and speedups when training with the supported algorithms. See New GPU Library Lowers Compute Costs for Apache Spark ML for more details. PySpark MLlib DataFrame API compatibility means easier incorporation into existing PySpark ML applications��

]]> 0 Tianna Nguy <![CDATA[New Self-Paced Course: RAPIDS Accelerator for Apache Spark]]> http://www.open-lab.net/blog/?p=71620 2023-11-02T18:14:38Z 2023-10-18T17:52:59Z

Dive into the RAPIDS Accelerator for Apache Spark toolset, including the workload qualification tool for estimating speedup on GPU and the profiling tool for...]]>

Dive into the RAPIDS Accelerator for Apache Spark toolset, including the workload qualification tool for estimating speedup on GPU and the profiling tool for... Photo at a skewed angle of person looking at a monitor that has graphics on it, against a grey background.

Photo at a skewed angle of person looking at a monitor that has graphics on it, against a grey background.

Dive into the RAPIDS Accelerator for Apache Spark toolset, including the workload qualification tool for estimating speedup on GPU and the profiling tool for tuning jobs.

]]> 4 Jess Nguyen <![CDATA[ICYMI: Run RAPIDS-Accelerated Apache Spark on Amazon EMR]]> http://www.open-lab.net/blog/?p=70848 2023-10-05T18:18:11Z 2023-09-14T17:00:00Z

Streamline and accelerate deployment by integrating ETL and ML training into a single Apache Spark script on Amazon EMR.]]>

Streamline and accelerate deployment by integrating ETL and ML training into a single Apache Spark script on Amazon EMR. An illustration representing Apache Spark.

An illustration representing Apache Spark.

Streamline and accelerate deployment by integrating ETL and ML training into a single Apache Spark script on Amazon EMR.

]]> 0 Joel Lashmore <![CDATA[GPUs for ETL? Optimizing ETL Architecture for Apache Spark SQL Operations]]> http://www.open-lab.net/blog/?p=70034 2023-11-10T01:26:53Z 2023-09-06T16:53:28Z

Extract-transform-load (ETL) operations with GPUs using the NVIDIA RAPIDS Accelerator for Apache Spark running on large-scale data can produce both cost savings...]]>

Extract-transform-load (ETL) operations with GPUs using the NVIDIA RAPIDS Accelerator for Apache Spark running on large-scale data can produce both cost savings...

GPUs for ETL Optimizing ETL Architecture for Apache Spark SQL Operations

Extract-transform-load (ETL) operations with GPUs using the NVIDIA RAPIDS Accelerator for Apache Spark running on large-scale data can produce both cost savings and performance gains. We demonstrated this in our previous post, GPUs for ETL? Run Faster, Less Costly Workloads with NVIDIA RAPIDS Accelerator for Apache Spark and Databricks. In this post, we dive deeper to identify precisely which��

]]> 0 Joel Lashmore <![CDATA[GPUs for ETL? Run Faster, Less Costly Workloads with NVIDIA RAPIDS Accelerator for Apache Spark and Databricks]]> http://www.open-lab.net/blog/?p=67503 2023-11-10T01:27:07Z 2023-07-17T18:08:30Z

We were stuck. Really stuck. With a hard delivery deadline looming, our team needed to figure out how to process a complex extract-transform-load (ETL) job on...]]>

We were stuck. Really stuck. With a hard delivery deadline looming, our team needed to figure out how to process a complex extract-transform-load (ETL) job on... Stylized image of a computer chip.

Stylized image of a computer chip.

We were stuck. Really stuck. With a hard delivery deadline looming, our team needed to figure out how to process a complex extract-transform-load (ETL) job on trillions of point-of-sale transaction records in a few hours. The results of this job would feed a series of downstream machine learning (ML) models that would make critical retail assortment allocation decisions for a global retailer.

]]> 0 Lee Yang <![CDATA[Distributed Deep Learning Made Easy with Spark 3.4]]> http://www.open-lab.net/blog/?p=66415 2024-06-06T16:23:05Z 2023-06-12T20:30:00Z

Apache Spark is an industry-leading platform for distributed extract, transform, and load (ETL) workloads on large-scale data. However, with the advent of deep...]]>

Apache Spark is an industry-leading platform for distributed extract, transform, and load (ETL) workloads on large-scale data. However, with the advent of deep... Deep learning abstract.

Deep learning abstract.

Apache Spark is an industry-leading platform for distributed extract, transform, and load (ETL) workloads on large-scale data. However, with the advent of deep learning (DL), many Spark practitioners have sought to add DL models to their data processing pipelines across a variety of use cases like sales predictions, content recommendations, sentiment analysis, and fraud detection. Yet��

]]> 0 Eyal Hirsch <![CDATA[GPU Integration Propels Data Center Efficiency and Cost Savings for Taboola]]> http://www.open-lab.net/blog/?p=65830 2024-05-09T21:41:21Z 2023-06-02T16:00:00Z

When you see a context-relevant advertisement on a web page, it's most likely content served by a Taboola data pipeline. As the leading content recommendation...]]>

When you see a context-relevant advertisement on a web page, it's most likely content served by a Taboola data pipeline. As the leading content recommendation... Picture of an aisle in a data center, with servers on either side.

Picture of an aisle in a data center, with servers on either side.

When you see a context-relevant advertisement on a web page, it��s most likely content served by a Taboola data pipeline. As the leading content recommendation company in the world, a big challenge for Taboola was the frequent need to scale Apache Spark CPU cluster capacity to address the constantly growing compute and storage requirements. Data center capacity and hardware costs are always��

]]> 1 Erik Ordentlich <![CDATA[New GPU Library Lowers Compute Costs for Apache Spark ML]]> http://www.open-lab.net/blog/?p=63301 2023-10-25T21:29:40Z 2023-04-18T16:00:00Z

Spark MLlib is a key component of Apache Spark for large-scale machine learning and provides built-in implementations of many popular machine learning...]]>

Spark MLlib is a key component of Apache Spark for large-scale machine learning and provides built-in implementations of many popular machine learning... Decorative image of dark blue background with points of light connected with lines.

Decorative image of dark blue background with points of light connected with lines.

Spark MLlib is a key component of Apache Spark for large-scale machine learning and provides built-in implementations of many popular machine learning algorithms. These implementations were created a decade ago, but do not leverage modern computing accelerators, such as NVIDIA GPUs. To address this gap, we have recently open-sourced Spark RAPIDS ML (NVIDIA/spark-rapids-ml)��

]]> 0 Shashank Gaur <![CDATA[Topic Modeling and Image Classification with Dataiku and NVIDIA Data Science]]> http://www.open-lab.net/blog/?p=62857 2023-11-03T07:15:04Z 2023-04-04T18:30:00Z

The Dataiku platform for everyday AI simplifies deep learning. Use cases are far-reaching, from image classification to object detection and natural language...]]>

The Dataiku platform for everyday AI simplifies deep learning. Use cases are far-reaching, from image classification to object detection and natural language...

Twitter topic model Dataiku diagram

The Dataiku platform for everyday AI simplifies deep learning. Use cases are far-reaching, from image classification to object detection and natural language processing (NLP). Dataiku helps you with labeling, model training, explainability, model deployment, and centralized management of code and code environments. This post dives into high-level Dataiku and NVIDIA integrations for image��

]]> 0 Phoebe Lee <![CDATA[Catapulting Enterprises to the Leading Edge of AI with NVIDIA AI Enterprise 3.1]]> http://www.open-lab.net/blog/?p=62231 2023-03-23T17:12:12Z 2023-03-21T16:31:22Z

Generative AI has marked an important milestone in the AI revolution journey. We are at a fundamental breaking point where enterprises are not only getting...]]>

Generative AI has marked an important milestone in the AI revolution journey. We are at a fundamental breaking point where enterprises are not only getting... An illustration of production AI workflow applications such as video conferencing, computer vision, and robotics.

An illustration of production AI workflow applications such as video conferencing, computer vision, and robotics.

Generative AI has marked an important milestone in the AI revolution journey. We are at a fundamental breaking point where enterprises are not only getting their feet wet but jumping into the deep end. With over 50 frameworks, pretrained models, and development tools, NVIDIA AI Enterprise, the software layer of the NVIDIA AI platform, is designed to accelerate enterprises to the leading edge��

]]> 0 Saurav Agarwal <![CDATA[Smarter Retail Data Analytics with GPU Accelerated Apache Spark Workloads on Google Cloud Dataproc]]> http://www.open-lab.net/blog/?p=61822 2023-11-10T01:30:21Z 2023-03-15T16:00:00Z

A retailer's supply chain includes the sourcing of raw materials or finished goods from suppliers; storing them in warehouses or distribution centers; and...]]>

A retailer's supply chain includes the sourcing of raw materials or finished goods from suppliers; storing them in warehouses or distribution centers; and...

rapids-spark-gcp-featured

A retailer��s supply chain includes the sourcing of raw materials or finished goods from suppliers; storing them in warehouses or distribution centers; and transporting them to stores or customers; managing sales. They also collect, store, and analyze data to optimize supply chain performance. Retailers have teams responsible for managing each stage of the supply chain��

]]> 0 Michelle Horton <![CDATA[Top Data Science Sessions at NVIDIA GTC 2023]]> http://www.open-lab.net/blog/?p=61342 2023-03-09T19:20:07Z 2023-02-24T17:01:19Z

Learn about the latest AI and data science breakthroughs from leading data science teams at NVIDIA GTC 2023.]]>

Learn about the latest AI and data science breakthroughs from leading data science teams at NVIDIA GTC 2023. An inforgraphic map of the United States with red and blue vertical bars rising from various areas of the map.

An inforgraphic map of the United States with red and blue vertical bars rising from various areas of the map.

Learn about the latest AI and data science breakthroughs from leading data science teams at NVIDIA GTC 2023.

]]> 0 Karthikeyan Rajendran <![CDATA[Saving Apache Spark Big Data Processing Costs on Google Cloud Dataproc]]> http://www.open-lab.net/blog/?p=58531 2022-12-16T01:07:22Z 2022-12-14T17:00:00Z

According to IDC, the volume of data generated each year is growing exponentially. IDC��s Global DataSphere projects that the world will generate 221 ZB...]]>

According to IDC, the volume of data generated each year is growing exponentially. IDC��s Global DataSphere projects that the world will generate 221 ZB...

NVIDIA RAPIDS Apache Spark

According to IDC, the volume of data generated each year is growing exponentially. IDC��s Global DataSphere projects that the world will generate 221 ZB of data by 2026. This data holds fantastic information. But as the volume of data grows, so does the processing cost. As a data scientist or engineer, you��ve certainly felt the pain of slow-running, data-processing jobs.

]]> 0 Siddharth Sharma <![CDATA[New SDKs Accelerating AI Research, Computer Vision, Data Science, and More]]> http://www.open-lab.net/blog/?p=54866 2024-05-07T19:33:29Z 2022-09-21T15:20:00Z

NVIDIA revealed major updates to its suite of AI software for developers including JAX, NVIDIA CV-CUDA, and NVIDIA RAPIDS. To learn about the latest SDK...]]>

NVIDIA revealed major updates to its suite of AI software for developers including JAX, NVIDIA CV-CUDA, and NVIDIA RAPIDS. To learn about the latest SDK...

convaiI-image-gtc22-fall

NVIDIA revealed major updates to its suite of AI software for developers including JAX, NVIDIA CV-CUDA, and NVIDIA RAPIDS. To learn about the latest SDK advancements from NVIDIA, watch the keynote from CEO Jensen Huang. Just today at GTC 2022, NVIDIA introduced JAX on NVIDIA AI, the newest addition to its GPU-accelerated deep learning frameworks. JAX is a rapidly growing��

]]> 0 Michelle Horton <![CDATA[Upcoming Event: Data Science Sessions at GTC 2022]]> http://www.open-lab.net/blog/?p=54059 2023-06-12T09:02:27Z 2022-08-30T16:00:00Z

Learn about the latest AI and data science breakthroughs from the world's leading data science teams at GTC 2022.]]>

Learn about the latest AI and data science breakthroughs from the world's leading data science teams at GTC 2022.

gtc22-fall-social-session-a41121-kv-1920x1080

Learn about the latest AI and data science breakthroughs from the world��s leading data science teams at GTC 2022.

]]> 0 Karthikeyan Rajendran <![CDATA[RAPIDS Accelerator for Apache Spark Release v21.10]]> http://www.open-lab.net/blog/?p=42877 2022-08-21T23:53:16Z 2022-01-06T16:00:00Z

RAPIDS Accelerator for Apache Spark v21.10 is now available! As an open source project, we value our community, their voice, and requests. This release...]]>

RAPIDS Accelerator for Apache Spark v21.10 is now available! As an open source project, we value our community, their voice, and requests. This release...

RAPIDS-Spark_Featured Image

RAPIDS Accelerator for Apache Spark v21.10 is now available! As an open source project, we value our community, their voice, and requests. This release constitutes community requests for operations that are ideally suited for GPU acceleration. Important callouts for this release: RAPIDS Accelerator for Apache Spark is growing at a great pace in both functionality and��

]]> 0 Jacob Schmitt <![CDATA[NVIDIA GTC: Top Data Science Sessions]]> http://www.open-lab.net/blog/?p=39383 2022-08-21T23:52:57Z 2021-11-01T19:00:00Z

NVIDIA GTC is the must attend AI conference for developers. It��s a place where practitioners, leaders, and innovators share their ideas about the latest...]]>

NVIDIA GTC is the must attend AI conference for developers. It��s a place where practitioners, leaders, and innovators share their ideas about the latest...

Image from iOS

NVIDIA GTC is the must attend AI conference for developers. It��s a place where practitioners, leaders, and innovators share their ideas about the latest trends in data science. Here are six top data science GTC sessions worth attending. Thursday, Nov 11, 5:00 AM �C 5:25 AM PST Domino��s Pizza delivers thousands of pizzas a day and requires real-time planning and logistics capabilities.

]]> 4 Eric Rife <![CDATA[RAPIDS Accelerator for Apache Spark Release v21.08]]> http://www.open-lab.net/blog/?p=36933 2022-08-21T23:52:36Z 2021-09-02T15:00:00Z

Introduction The August release (21.08) of RAPIDS Accelerator for Apache Spark is now available. It has been a little over a year since the first release at...]]>

Introduction The August release (21.08) of RAPIDS Accelerator for Apache Spark is now available. It has been a little over a year since the first release at...

RAPIDS&SPARK

The August release (21.08) of RAPIDS Accelerator for Apache Spark is now available. It has been a little over a year since the first release at NVIDIA GTC 2020. We have improved in so many ways, particularly in terms of ease-of-use with minimal to no-code change for Apache Spark applications. This last year, the team has been focused on adding both functionality and continuously improving��

]]> 0 Alexander Spiridonov <![CDATA[Building NVIDIA GPU-Accelerated Pipelines on Azure Synapse Analytics with RAPIDS]]> http://www.open-lab.net/blog/?p=34971 2022-08-21T23:52:19Z 2021-08-03T17:00:00Z

Azure recently announced support for NVIDIA��s T4 Tensor Core Graphics Processing Units (GPUs) which are optimized for deploying machine learning inferencing...]]>

Azure recently announced support for NVIDIA��s T4 Tensor Core Graphics Processing Units (GPUs) which are optimized for deploying machine learning inferencing...

Azure Synaps_Featured Image

Azure recently announced support for NVIDIA��s T4 Tensor Core Graphics Processing Units (GPUs) which are optimized for deploying machine learning inferencing or analytical workloads in a cost-effective manner. With Apache Spark deployments tuned for NVIDIA GPUs, plus pre-installed libraries, Azure Synapse Analytics offers a simple way to leverage GPUs to power a variety of data processing and��

]]> 0 Saloni Jain <![CDATA[RAPIDS Accelerator for Apache Spark v21.06 Release]]> http://www.open-lab.net/blog/?p=35470 2022-08-21T23:52:22Z 2021-07-30T16:30:00Z

Introduction RAPIDS Accelerator for Apache Spark v21.06 is here! You may notice right away that we��ve had a huge leap in version number since we announced our...]]>

Introduction RAPIDS Accelerator for Apache Spark v21.06 is here! You may notice right away that we��ve had a huge leap in version number since we announced our...

RAPIDS&SPARK

RAPIDS Accelerator for Apache Spark v21.06 is here! You may notice right away that we��ve had a huge leap in version number since we announced our last release. Don��t worry, you haven��t missed anything. RAPIDS Accelerator is built on cuDF, part of the RAPIDS ecosystem. RAPIDS transitioned to calendar versioning (CalVer) in the last release, and, from now on, our releases will follow the same��

]]> 0 William Benton <![CDATA[An End-to-End Blueprint for Customer Churn Modeling and Prediction-Part 3]]> http://www.open-lab.net/blog/?p=31807 2022-08-21T23:51:43Z 2021-05-24T15:00:00Z

Editor's Note: Get notified and be the first to download our real-world blueprint once it's available. This is the third installment in a series describing an...]]>

Editor's Note: Get notified and be the first to download our real-world blueprint once it's available. This is the third installment in a series describing an...

End-to-End_Pic1

Editor��s Note: Get notified and be the first to download our real-world blueprint once it��s available. This is the third installment in a series describing an end-to-end blueprint for predicting customer churn. In previous installments, we��ve discussed some of the challenges of machine learning systems that don��t appear until you get to production: In the first installment��

]]> 0 Piotr Bigaj <![CDATA[Accelerating the Wide & Deep Model Workflow from 25 Hours to 10 Minutes Using NVIDIA GPUs]]> http://www.open-lab.net/blog/?p=29663 2024-10-28T19:02:41Z 2021-04-29T22:15:38Z

Recommender systems drive engagement on many of the most popular online platforms. As data volume grows exponentially, data scientists increasingly turn from...]]>

Recommender systems drive engagement on many of the most popular online platforms. As data volume grows exponentially, data scientists increasingly turn from...

Training with NVTabular

Recommender systems drive engagement on many of the most popular online platforms. As data volume grows exponentially, data scientists increasingly turn from traditional machine learning methods to highly expressive, deep learning models to improve recommendation quality. Often, the recommendations are framed as modeling the completion of a user-item matrix, in which the user-item entry is the��

]]> 1 William Benton <![CDATA[An End-to-End Blueprint for Accelerating Customer Churn Modeling and Prediction-Part 1]]> http://www.open-lab.net/blog/?p=24199 2022-08-21T23:41:05Z 2021-03-02T21:06:00Z

Editor's Note: Get notified and be the first to download our real-world blueprint once it's available. If you want to solve a particular kind of business...]]>

Editor's Note: Get notified and be the first to download our real-world blueprint once it's available. If you want to solve a particular kind of business...

pexels-jeshootscom-834892

Editor��s Note: Get notified and be the first to download our real-world blueprint once it��s available. If you want to solve a particular kind of business problem with machine learning, you��ll likely have no trouble finding a tutorial showing you how to extract features and train a model. However, building machine learning systems isn��t just about training models or even about finding the best��

]]> 1 Qing Lan <![CDATA[Accelerating Deep Learning with Apache Spark and NVIDIA GPUs on AWS]]> http://www.open-lab.net/blog/?p=24036 2022-08-21T23:41:03Z 2021-02-23T21:58:00Z

With the growing interest in deep learning (DL), more and more users are using DL in production environments. Because DL requires intensive computational power,...]]>

With the growing interest in deep learning (DL), more and more users are using DL in production environments. Because DL requires intensive computational power,...

Featured-Image-

With the growing interest in deep learning (DL), more and more users are using DL in production environments. Because DL requires intensive computational power, developers are leveraging GPUs to do their training and inference jobs. Recently, as part of a major Apache Spark initiative to better unify DL and data processing on Spark, GPUs became a schedulable resource in Apache Spark 3.

]]> 0 Carol McDonald <![CDATA[Improving Apache Spark Performance and Reducing Costs with Amazon EMR and NVIDIA]]> http://www.open-lab.net/blog/?p=23620 2022-08-21T23:41:00Z 2021-01-26T22:28:29Z

Apache Spark has emerged as the standard framework for large-scale, distributed, data analytics processing. NVIDIA worked with the Apache Spark community to...]]>

Apache Spark has emerged as the standard framework for large-scale, distributed, data analytics processing. NVIDIA worked with the Apache Spark community to...

AWS-NVDA_Freaute-Image-

Apache Spark has emerged as the standard framework for large-scale, distributed, data analytics processing. NVIDIA worked with the Apache Spark community to accelerate the world��s most popular data analytics framework and to offer revolutionary GPU acceleration on several leading platforms, including Google Cloud, Databricks, and Cloudera. Now, Amazon EMR joins the list of leading platforms��

]]> 0 Rong Ou <![CDATA[Making Apache Spark More Concurrent]]> http://www.open-lab.net/blog/?p=22897 2022-08-21T23:40:51Z 2020-12-18T23:50:30Z

Apache Spark provides capabilities to program entire clusters with implicit data parallelism. With Spark 3.0 and the open source RAPIDS Accelerator for Spark,...]]>

Apache Spark provides capabilities to program entire clusters with implicit data parallelism. With Spark 3.0 and the open source RAPIDS Accelerator for Spark,...

ApacheSpark_featuredimage

Apache Spark provides capabilities to program entire clusters with implicit data parallelism. With Spark 3.0 and the open source RAPIDS Accelerator for Spark, these capabilities are extended to GPUs. However, prior to this work, all CUDA operations happen in the default stream, causing implicit synchronization and not taking advantage of concurrency on the GPU. In this post, we look at how to use��

]]> 1 Bernease Herman <![CDATA[Monitoring High-Performance Machine Learning Models with RAPIDS and whylogs]]> http://www.open-lab.net/blog/?p=22834 2022-08-21T23:40:50Z 2020-12-14T23:54:43Z

Machine learning (ML) data is big and messy. Organizations have increasingly adopted RAPIDS and cuML to help their teams run experiments faster and achieve...]]>

Machine learning (ML) data is big and messy. Organizations have increasingly adopted RAPIDS and cuML to help their teams run experiments faster and achieve...

Whylog

Machine learning (ML) data is big and messy. Organizations have increasingly adopted RAPIDS and cuML to help their teams run experiments faster and achieve better model performance on larger datasets. That, in turn, accelerates the training of ML models using GPUs. With RAPIDS, data scientists can now train models 100X faster and more frequently. Like RAPIDS, we��ve ensured that our data logging��

]]> 0 Carol McDonald <![CDATA[Accelerating Spark 3.0 and XGBoost End-to-End Training and Hyperparameter Tuning]]> http://www.open-lab.net/blog/?p=21124 2023-03-22T01:09:02Z 2020-10-05T18:30:00Z

At GTC Spring 2020, Adobe, Verizon Media, and Uber each discussed how they used Spark 3.0 with GPUs to accelerate and scale ML big data pre-processing,...]]>

At GTC Spring 2020, Adobe, Verizon Media, and Uber each discussed how they used Spark 3.0 with GPUs to accelerate and scale ML big data pre-processing,...

spark-stack

At GTC Spring 2020, Adobe, Verizon Media, and Uber each discussed how they used Spark 3.0 with GPUs to accelerate and scale ML big data pre-processing, training, and tuning pipelines. There are multiple challenges when it comes to the performance of large-scale machine learning (ML) solutions: huge datasets, complex data preprocessing and feature engineering pipelines��

]]> 1 Carol McDonald <![CDATA[Optimizing and Improving Spark 3.0 Performance with GPUs]]> http://www.open-lab.net/blog/?p=20560 2022-08-21T23:40:37Z 2020-09-01T20:15:10Z

Apache Spark continued the effort to analyze big data that Apache Hadoop started over 15 years ago and has become the leading framework for large-scale...]]>

Apache Spark continued the effort to analyze big data that Apache Hadoop started over 15 years ago and has become the leading framework for large-scale...

spark-plus-gpu-era

Apache Spark continued the effort to analyze big data that Apache Hadoop started over 15 years ago and has become the leading framework for large-scale distributed data processing. Today, hundreds of thousands of data engineers and scientists are working with Spark across 16,000+ enterprises and organizations. One reason why Spark has taken the torch from Hadoop is because it can process data��

]]> 3 Carol McDonald <![CDATA[Accelerating Apache Spark 3.0 with GPUs and RAPIDS]]> http://www.open-lab.net/blog/?p=18772 2022-08-21T23:40:21Z 2020-07-02T22:47:40Z

Given the parallel nature of many data processing tasks, it��s only natural that the massively parallel architecture of a GPU should be able to parallelize and...]]>

Given the parallel nature of many data processing tasks, it��s only natural that the massively parallel architecture of a GPU should be able to parallelize and...

end-to-end-pipeline (2)

Given the parallel nature of many data processing tasks, it��s only natural that the massively parallel architecture of a GPU should be able to parallelize and accelerate Apache Spark data processing queries, in the same way that a GPU accelerates deep learning (DL) in artificial intelligence (AI). NVIDIA has worked with the Apache Spark community to implement GPU acceleration through the��

]]> 0 Brad Bebee <![CDATA[What to Do with All That Bandwidth? GPUs for Graph and Predictive Analytics]]> http://www.open-lab.net/blog/parallelforall/?p=6484 2022-08-21T23:37:48Z 2016-03-22T04:49:17Z

[caption id="attachment_6509" align="alignright" width="300"] Figure 1: Graph algorithms exhibit non-locality and data-dependent parallelism. Large graphs, such...]]>

[caption id="attachment_6509" align="alignright" width="300"] Figure 1: Graph algorithms exhibit non-locality and data-dependent parallelism. Large graphs, such...

Figure 1: Graph algorithms exhibit non-locality and data-dependent parallelism. Large graphs, such as this map of the internet, represent billion-edge challenges that push the bandwidth limits of existing hardware architectures.

Did you see the White House��s recent initiative on Precision Medicine and how it is transforming the ways we can treat cancer? Have you avoided clicking on a malicious website based on OpenDNS��s SecureRank predictive analytics? Are you using the Wikidata Query Service to gather data to use in your machine learning or deep learning application? If so, you have seen the power of graph applications.

]]> 2 ��˳��97caoporen��