Development & Optimization – NVIDIA Technical Blog

Development & Optimization – NVIDIA Technical Blog News and tutorials for developers, data scientists, and IT admins 2025-05-15T19:08:55Z http://www.open-lab.net/blog/feed/ Matt Ahrens <![CDATA[Predicting Performance on Apache Spark with GPUs]]> http://www.open-lab.net/blog/?p=100118 2025-05-15T19:07:19Z 2025-05-15T17:00:00Z

The world of big data analytics is constantly seeking ways to accelerate processing and reduce infrastructure costs. Apache Spark has become a leading platform...]]>

The world of big data analytics is constantly seeking ways to accelerate processing and reduce infrastructure costs. Apache Spark has become a leading platform for scale-out analytics, handling massive datasets for ETL, machine learning, and deep learning workloads. While traditionally CPU-based, the advent of GPU acceleration offers a compelling promise: significant speedups for data processing…

]]> Brad Nemire <![CDATA[Get Trained and Certified at GTC Paris at VivaTech 2025]]> http://www.open-lab.net/blog/?p=100034 2025-05-15T19:07:25Z 2025-05-14T16:16:06Z

Join us at GTC Paris on June 10th and choose from six full-day, instructor-led workshops.]]>

Join us at GTC Paris on June 10th and choose from six full-day, instructor-led workshops.

]]> Jaydeep Marathe <![CDATA[CUDA C++ Compiler Updates Impacting ELF Visibility and Linkage]]> http://www.open-lab.net/blog/?p=99693 2025-05-15T19:07:32Z 2025-05-09T16:51:02Z

In the next CUDA major release, CUDA 13.0, NVIDIA is introducing two significant changes to the NVIDIA CUDA Compiler Driver (NVCC) that will impact ELF...]]>

]]> Jonathan Bentz <![CDATA[Just Released: CUDA 12.9]]> http://www.open-lab.net/blog/?p=99599 2025-05-15T19:07:49Z 2025-05-05T15:39:54Z

New features include enhancements to confidential computing and family-specific features and targets supported by NVCC.?]]>

New features include enhancements to confidential computing and family-specific features and targets supported by NVCC.

]]> Mark Harris <![CDATA[An Even Easier Introduction to CUDA (Updated)]]> http://www.open-lab.net/blog/parallelforall/?p=7501 2025-05-15T19:08:25Z 2025-05-02T17:31:00Z

Note: This blog post was originally published on Jan 25, 2017, but has been edited to reflect new updates. This post is a super simple introduction to CUDA, the...]]>

]]> 141 Brad Nemire <![CDATA[HackAI Challenge Winners Announced]]> http://www.open-lab.net/blog/?p=99563 2025-05-15T19:08:27Z 2025-05-02T16:31:11Z

Explore the groundbreaking projects and real-world impacts of the HackAI Challenge powered by NVIDIA AI Workbench and Dell Precision.]]>

Explore the groundbreaking projects and real-world impacts of the HackAI Challenge powered by NVIDIA AI Workbench and Dell Precision.

]]> Jonathan Bentz <![CDATA[NVIDIA Blackwell and NVIDIA CUDA 12.9 Introduce Family-Specific Architecture Features]]> http://www.open-lab.net/blog/?p=98753 2025-05-15T19:08:27Z 2025-05-01T22:39:39Z

One of the earliest architectural design decisions that went into the CUDA platform for NVIDIA GPUs was support for backward compatibility of GPU code. This...]]>

]]> Allison Ding <![CDATA[Stacking Generalization with HPO: Maximize Accuracy in 15 Minutes with NVIDIA cuML]]> http://www.open-lab.net/blog/?p=99417 2025-05-15T19:08:30Z 2025-05-01T18:35:18Z

Stacking generalization is a widely used technique among machine learning (ML) engineers, where multiple models are combined to boost overall predictive...]]>

Stacking generalization is a widely used technique among machine learning (ML) engineers, where multiple models are combined to boost overall predictive performance. On the other hand, hyperparameter optimization (HPO) involves systematically searching for the best set of hyperparameters to maximize the performance of a given ML algorithm. A common challenge when using both stacking and HPO…

]]> Jenn Yonemitsu <![CDATA[Kaggle Grandmasters Unveil Winning Strategies for Data Science Superpowers]]> http://www.open-lab.net/blog/?p=99350 2025-05-15T19:08:33Z 2025-04-29T17:22:59Z

Kaggle Grandmasters David Austin and Chris Deotte from NVIDIA and Ruchi Bhatia from HP joined Brenda Flynn from Kaggle at this year��s Google Cloud Next...]]>

Kaggle Grandmasters David Austin and Chris Deotte from NVIDIA and Ruchi Bhatia from HP joined Brenda Flynn from Kaggle at this year’s Google Cloud Next conference in Las Vegas. They shared a bit about who they are, what motivates them to compete, and how they contribute to and win competitions on the world’s largest data science competition platform. This blog post captures a glimpse of…

]]> Jean-Eudes Marvie <![CDATA[Real-Time GPU-Accelerated Gaussian Splatting with NVIDIA DesignWorks Sample vk_gaussian_splatting]]> http://www.open-lab.net/blog/?p=98796 2025-05-15T19:08:43Z 2025-04-23T20:00:00Z

Gaussian splatting is a novel approach to rendering complex 3D scenes by representing them as a collection of anisotropic Gaussians in 3D space. This technique...]]>

Gaussian splatting is a novel approach to rendering complex 3D scenes by representing them as a collection of anisotropic Gaussians in 3D space. This technique enables real-time rendering of photorealistic scenes learned from small sets of images, making it ideal for applications in gaming, virtual reality, and real-time professional visualization. vk_gaussian_splatting is a new Vulkan-based…

]]> Bo Dong <![CDATA[NVIDIA cuPyNumeric 25.03 Now Fully Open Source with PIP and HDF5 Support]]> http://www.open-lab.net/blog/?p=99089 2025-05-15T19:08:44Z 2025-04-23T19:26:07Z

NVIDIA cuPyNumeric is a library that aims to provide a distributed and accelerated drop-in replacement for NumPy built on top of the Legate framework. It brings...]]>

NVIDIA cuPyNumeric is a library that aims to provide a distributed and accelerated drop-in replacement for NumPy built on top of the Legate framework. It brings zero-code-change scaling to multi-GPU and multinode (MGMN) accelerated computing. cuPyNumeric 25.03 is a milestone update that introduces powerful new capabilities and enhanced accessibility for users and developers alike…

]]> Maximilian M��ller <![CDATA[Optimizing Transformer-Based Diffusion Models for Video Generation with NVIDIA TensorRT]]> http://www.open-lab.net/blog/?p=98927 2025-05-15T19:08:48Z 2025-04-21T18:44:38Z

State-of-the-art image diffusion models take tens of seconds to process a single image. This makes video diffusion even more challenging, requiring significant...]]>

State-of-the-art image diffusion models take tens of seconds to process a single image. This makes video diffusion even more challenging, requiring significant computational resources and high costs. By leveraging the latest FP8 quantization features on NVIDIA Hopper GPUs with NVIDIA TensorRT, it’s possible to significantly reduce inference costs and serve more users with fewer GPUs.

]]> Daniel Rodriguez <![CDATA[Announcing ComputeEval, an Open-Source Framework for Evaluating LLMs on CUDA]]> http://www.open-lab.net/blog/?p=98885 2025-05-15T19:08:55Z 2025-04-16T16:48:07Z

Large language models (LLMs) are revolutionizing how developers code and how they learn to code. For seasoned or junior developers alike, today��s...]]>

Large language models (LLMs) are revolutionizing how developers code and how they learn to code. For seasoned or junior developers alike, today’s state-of-the-art models can generate Python scripts, React-based websites, and more. In the future, powerful AI models will assist developers in writing high-performance GPU code. This raises an important question: How can it be determined whether an LLM…

]]> Matt Ahrens <![CDATA[Accelerating Apache Parquet Scans on Apache Spark with GPUs]]> http://www.open-lab.net/blog/?p=98350 2025-04-22T23:57:50Z 2025-04-03T16:18:03Z

As data sizes have grown in enterprises across industries, Apache Parquet has become a prominent format for storing data. Apache Parquet is a columnar storage...]]>

As data sizes have grown in enterprises across industries, Apache Parquet has become a prominent format for storing data. Apache Parquet is a columnar storage format designed for efficient data processing at scale. By organizing data by columns rather than rows, Parquet enables high-performance querying and analysis, as it can read only the necessary columns for a query instead of scanning entire…

]]> 3 Ashraf Eassa <![CDATA[NVIDIA Blackwell Delivers Massive Performance Leaps in MLPerf Inference v5.0]]> http://www.open-lab.net/blog/?p=98367 2025-04-23T19:41:12Z 2025-04-02T18:14:48Z

The compute demands for large language model (LLM) inference are growing rapidly, fueled by the combination of growing model sizes, real-time latency...]]>

The compute demands for large language model (LLM) inference are growing rapidly, fueled by the combination of growing model sizes, real-time latency requirements, and, most recently, AI reasoning. At the same time, as AI adoption grows, the ability of an AI factory to serve as many users as possible, all while maintaining good per-user experiences, is key to maximizing the value it generates.

]]> Pradyumna Desale <![CDATA[Automating AI Factories with NVIDIA Mission Control]]> http://www.open-lab.net/blog/?p=98012 2025-04-03T18:47:00Z 2025-03-25T18:45:11Z

Advanced AI models such as DeepSeek-R1 are proving that enterprises can now build cutting-edge AI models specialized with their own data and expertise. These...]]>

Advanced AI models such as DeepSeek-R1 are proving that enterprises can now build cutting-edge AI models specialized with their own data and expertise. These models can be tailored to unique use cases, tackling diverse challenges like never before. Based on the success of early AI adopters, many organizations are shifting their focus to full-scale production AI factories. Yet the process of…

]]> Andrew Fear <![CDATA[NVIDIA Demonstrates GeForce NOW for Game AI Inference and Streamlined Hands-on Opportunities]]> http://www.open-lab.net/blog/?p=97825 2025-04-17T18:17:43Z 2025-03-20T17:34:38Z

NVIDIA cloud gaming service GeForce NOW is providing developers and publishers with new tools to bring their games to more gamers��and offer new experiences...]]>

NVIDIA cloud gaming service GeForce NOW is providing developers and publishers with new tools to bring their games to more gamers—and offer new experiences only possible through the cloud. These tools lower local GPU requirements to expand reach and eliminate cost by offloading AI inference tasks to the cloud. At the Gamer Developer’s Conference (GDC) 2025, NVIDIA demonstrated hybrid AI…

]]> Amr Elmeleegy <![CDATA[Introducing NVIDIA Dynamo, A Low-Latency Distributed Inference Framework for Scaling Reasoning AI Models]]> http://www.open-lab.net/blog/?p=95274 2025-04-23T00:15:55Z 2025-03-18T17:50:00Z

NVIDIA announced the release of NVIDIA Dynamo today at GTC 2025. NVIDIA Dynamo is a high-throughput, low-latency open-source inference serving framework for...]]>

NVIDIA announced the release of NVIDIA Dynamo today at GTC 2025. NVIDIA Dynamo is a high-throughput, low-latency open-source inference serving framework for deploying generative AI and reasoning models in large-scale distributed environments. The framework boosts the number of requests served by up to 30x, when running the open-source DeepSeek-R1 models on NVIDIA Blackwell.

]]> 1 Tony Scudiero <![CDATA[Understanding PTX, the Assembly Language of CUDA GPU Computing]]> http://www.open-lab.net/blog/?p=96891 2025-04-23T00:32:55Z 2025-03-12T18:00:00Z

Parallel thread execution (PTX) is a virtual machine instruction set architecture that has been part of CUDA from its beginning. You can think of PTX as the...]]>

Parallel thread execution (PTX) is a virtual machine instruction set architecture that has been part of CUDA from its beginning. You can think of PTX as the assembly language of the NVIDIA CUDA GPU computing platform. In this post, we’ll explain what that means, what PTX is for, and what you need to know about it to make the most of CUDA for your applications. We’ll start by walking through…

]]> Nikhil Gupta <![CDATA[Optimizing Compile Times for CUDA C++]]> http://www.open-lab.net/blog/?p=96775 2025-04-23T00:36:07Z 2025-03-10T18:02:27Z

In modern software development, time is an incredibly valuable resource, especially during the compilation process. For developers working with CUDA C++ on...]]>

In modern software development, time is an incredibly valuable resource, especially during the compilation process. For developers working with CUDA C++ on large-scale GPU-accelerated applications, optimizing compile times can significantly enhance productivity and streamline the entire development cycle. When using the compiler for offline compilation, efficient compilation times enable…

]]> Shelby Thomas <![CDATA[Ensuring Reliable Model Training on NVIDIA DGX Cloud]]> http://www.open-lab.net/blog/?p=96789 2025-03-24T18:36:43Z 2025-03-10T16:26:44Z

Training AI models on massive GPU clusters presents significant challenges for model builders. Because manual intervention becomes impractical as job scale...]]>

Training AI models on massive GPU clusters presents significant challenges for model builders. Because manual intervention becomes impractical as job scale increases, automation is critical to maintaining high GPU utilization and training productivity. An exceptional training experience requires resilient systems that provide low-latency error attribution and automatic fail over based on root…

]]> Anton Anders <![CDATA[NVIDIA cuDSS Advances Solver Technologies for Engineering and Scientific Computing]]> http://www.open-lab.net/blog/?p=96466 2025-04-23T02:36:28Z 2025-02-25T18:30:56Z

NVIDIA cuDSS is a first-generation sparse direct solver library designed to accelerate engineering and scientific computing. cuDSS is increasingly adopted in...]]>

NVIDIA cuDSS is a first-generation sparse direct solver library designed to accelerate engineering and scientific computing. cuDSS is increasingly adopted in data centers and other environments and supports single-GPU, multi-GPU and multi-node (MGMN) configurations. cuDSS has become a key tool for accelerating computer-aided engineering (CAE) workflows and scientific computations across…

]]> Sama Bali <![CDATA[Transforming Product Design Workflows in Manufacturing with Generative AI]]> http://www.open-lab.net/blog/?p=96242 2025-04-23T02:42:26Z 2025-02-20T19:32:11Z

Traditional design and engineering workflows in the manufacturing industry have long been characterized by a sequential, iterative approach that is often...]]>

Traditional design and engineering workflows in the manufacturing industry have long been characterized by a sequential, iterative approach that is often time-consuming and resource intensive. These conventional methods typically involve stages such as requirement gathering, conceptual design, detailed design, analysis, prototyping, and testing, with each phase dependent on the results of previous…

]]> Terry Chen <![CDATA[Automating GPU Kernel Generation with DeepSeek-R1 and Inference Time Scaling]]> http://www.open-lab.net/blog/?p=95998 2025-04-23T02:45:39Z 2025-02-12T18:00:00Z

As AI models extend their capabilities to solve more sophisticated challenges, a new scaling law known as test-time scaling or inference-time scaling is...]]>

As AI models extend their capabilities to solve more sophisticated challenges, a new scaling law known as test-time scaling or inference-time scaling is emerging. Also known as AI reasoning or long-thinking, this technique improves model performance by allocating additional computational resources during inference to evaluate multiple possible outcomes and then selecting the best one…

]]> 2 Sama Bali <![CDATA[GPU Memory Essentials for AI Performance]]> http://www.open-lab.net/blog/?p=94979 2025-01-23T19:54:24Z 2025-01-15T16:00:00Z

Generative AI has revolutionized how people bring ideas to life, and agentic AI represents the next leap forward in this technological evolution. By leveraging...]]>

Generative AI has revolutionized how people bring ideas to life, and agentic AI represents the next leap forward in this technological evolution. By leveraging sophisticated, autonomous reasoning and iterative planning, AI agents can tackle complex, multistep problems with remarkable efficiency. As AI continues to revolutionize industries, the demand for running AI models locally has surged.

]]> 1 Keith Morley <![CDATA[Efficient Ray Tracing with NVIDIA OptiX Shader Binding Table Optimization]]> http://www.open-lab.net/blog/?p=93527 2024-12-17T19:24:56Z 2024-12-17T19:24:53Z

NVIDIA OptiX is the API for GPU-accelerated ray tracing with CUDA, and is often used to render scenes containing a wide variety of objects and materials. During...]]>

NVIDIA OptiX is the API for GPU-accelerated ray tracing with CUDA, and is often used to render scenes containing a wide variety of objects and materials. During an OptiX launch, when a ray intersects a geometric primitive, a hit shader is executed. The question of which shader is executed for a given intersection is answered by the Shader Binding Table (SBT). The SBT may also be used to map input…

]]> Michelle Horton <![CDATA[Top Posts of 2024 Highlight NVIDIA NIM, LLM Breakthroughs, and Data Science Optimization]]> http://www.open-lab.net/blog/?p=93566 2024-12-16T18:34:16Z 2024-12-16T18:34:14Z

2024 was another landmark year for developers, researchers, and innovators working with NVIDIA technologies. From groundbreaking developments in AI inference to...]]>

2024 was another landmark year for developers, researchers, and innovators working with NVIDIA technologies. From groundbreaking developments in AI inference to empowering open-source contributions, these blog posts highlight the breakthroughs that resonated most with our readers. NVIDIA NIM Offers Optimized Inference Microservices for Deploying AI Models at Scale Introduced in…

]]> 0 Shijie Liu <![CDATA[Boost Large-Scale Recommendation System Training Embedding Using EMBark]]> http://www.open-lab.net/blog/?p=92378 2024-12-05T01:24:55Z 2024-11-20T17:09:08Z

Recommendation systems are core to the Internet industry, and efficiently training them is a key issue for various companies. Most recommendation systems are...]]>

Recommendation systems are core to the Internet industry, and efficiently training them is a key issue for various companies. Most recommendation systems are deep learning recommendation models (DLRMs), containing billions or even tens of billions of ID features. Figure 1 shows a typical structure. In recent years, GPU solutions such as NVIDIA Merlin HugeCTR and TorchRec have…

]]> Michelle Horton <![CDATA[Deep Learning AI Model Identifies Breast Cancer Spread without Surgery]]> http://www.open-lab.net/blog/?p=91133 2024-12-20T18:48:46Z 2024-10-31T16:06:07Z

A new deep learning model could reduce the need for surgery when diagnosing whether cancer cells are spreading, including to nearby lymph nodes��also known as...]]>

A new deep learning model could reduce the need for surgery when diagnosing whether cancer cells are spreading, including to nearby lymph nodes—also known as metastasis. Developed by researchers from the University of Texas Southwestern Medical Center, the AI tool analyzes time-series MRIs and clinical data to identify metastasis, providing crucial, noninvasive support for doctors in treatment…

]]> Michelle Horton <![CDATA[Maximizing Energy and Power Efficiency in Applications with NVIDIA GPUs]]> http://www.open-lab.net/blog/?p=90100 2024-10-30T18:55:08Z 2024-10-16T16:50:10Z

As the demand for high-performance computing (HPC) and AI applications grows, so does the importance of energy efficiency. NVIDIA Principal Developer Technology...]]>

As the demand for high-performance computing (HPC) and AI applications grows, so does the importance of energy efficiency. NVIDIA Principal Developer Technology Engineer, Alan Gray, shares insights on optimizing energy and power efficiency for various applications running on the latest NVIDIA technologies, including NVIDIA H100 Tensor Core GPUs and NVIDIA DGX A100 systems. Traditionally…

]]> Charlie Huang <![CDATA[Scale High-Performance AI Inference with Google Kubernetes Engine and NVIDIA NIM]]> http://www.open-lab.net/blog/?p=90198 2024-10-30T18:57:03Z 2024-10-16T16:30:00Z

The rapid evolution of AI models has driven the need for more efficient and scalable inferencing solutions. As organizations strive to harness the power of AI,...]]>

The rapid evolution of AI models has driven the need for more efficient and scalable inferencing solutions. As organizations strive to harness the power of AI, they face challenges in deploying, managing, and scaling AI inference workloads. NVIDIA NIM and Google Kubernetes Engine (GKE) together offer a powerful solution to address these challenges. NVIDIA has collaborated with Google Cloud to…

]]> ��˳��97caoporen��