Hopper – NVIDIA Technical Blog

Hopper – NVIDIA Technical Blog News and tutorials for developers, data scientists, and IT admins 2025-07-03T22:20:47Z http://www.open-lab.net/blog/feed/ Dan Ernst <![CDATA[How Modern Supercomputers Powered by NVIDIA Are Pushing the Limits of Speed �� and Science]]> http://www.open-lab.net/blog/?p=101731 2025-06-12T18:48:39Z 2025-06-10T09:00:00Z

Modern high-performance computing (HPC) is enabling more than just quick calculations �� it��s powering AI systems that are unlocking scientific...]]>

Modern high-performance computing (HPC) is enabling more than just quick calculations �� it��s powering AI systems that are unlocking scientific...

hpc-tech-blog-isc-top-500-1920x1080

Modern high-performance computing (HPC) is enabling more than just quick calculations �� it��s powering AI systems that are unlocking scientific breakthroughs. HPC has gone through many iterations, each sparked by a creative repurposing of technologies. For example, early supercomputers used off-the-shelf components. Researchers later built powerful clusters from personal computers and even��

]]> 0 Kanika Atri <![CDATA[Telcos Across Five Continents Are Building NVIDIA-Powered Sovereign AI Infrastructure]]> http://www.open-lab.net/blog/?p=100828 2025-06-25T17:54:15Z 2025-05-30T16:01:21Z

AI is becoming the cornerstone of innovation across industries, driving new levels of creativity and productivity and fundamentally reshaping how we live and...]]>

AI is becoming the cornerstone of innovation across industries, driving new levels of creativity and productivity and fundamentally reshaping how we live and...

end-to-end-press-ai-factory-kv-1920x1080

AI is becoming the cornerstone of innovation across industries, driving new levels of creativity and productivity and fundamentally reshaping how we live and work. And it��s enabled by a new type of infrastructure��the AI factory��that manufactures intelligence at scale and creates the foundation for what many consider the next industrial revolution. AI factories represent a reset of traditional��

]]> 0 Karin Sevegnani <![CDATA[Advanced Optimization Strategies for LLM Training on NVIDIA Grace Hopper]]> http://www.open-lab.net/blog/?p=100702 2025-06-12T18:50:59Z 2025-05-27T17:31:00Z

In the previous post, Profiling LLM Training Workflows on NVIDIA Grace Hopper, we explored the importance of profiling large language model (LLM) training...]]>

In the previous post, Profiling LLM Training Workflows on NVIDIA Grace Hopper, we explored the importance of profiling large language model (LLM) training...

grace-hopper-superchip

In the previous post, Profiling LLM Training Workflows on NVIDIA Grace Hopper, we explored the importance of profiling large language model (LLM) training workflows and analyzed bottlenecks using NVIDIA Nsight Systems. We also discussed how the NVIDIA GH200 Grace Hopper Superchip enables efficient training processes. While profiling helps identify inefficiencies��

]]> 0 Karin Sevegnani <![CDATA[Profiling LLM Training Workflows on NVIDIA Grace Hopper]]> http://www.open-lab.net/blog/?p=100669 2025-06-12T18:51:00Z 2025-05-27T17:30:00Z

The rapid advancements in AI have resulted in an era of exponential growth in model sizes, particularly in the domain of large language models (LLMs). These...]]>

The rapid advancements in AI have resulted in an era of exponential growth in model sizes, particularly in the domain of large language models (LLMs). These...

person-sitting-desktop-computer

The rapid advancements in AI have resulted in an era of exponential growth in model sizes, particularly in the domain of large language models (LLMs). These models, with their transformative capabilities, are driving innovation across industries. However, the increasing complexity and computational demands of training such models necessitate a meticulous approach to optimization and profiling.

]]> 0 Babak Hejazi <![CDATA[Boosting Matrix Multiplication Speed and Flexibility with NVIDIA cuBLAS 12.9]]> http://www.open-lab.net/blog/?p=99184 2025-07-01T16:36:10Z 2025-05-01T20:00:00Z

The NVIDIA CUDA-X math libraries empower developers to build accelerated applications for AI, scientific computing, data processing, and more. Two...]]>

The NVIDIA CUDA-X math libraries empower developers to build accelerated applications for AI, scientific computing, data processing, and more. Two... An image representing matrix multiplication.

An image representing matrix multiplication.

The NVIDIA CUDA-X math libraries empower developers to build accelerated applications for AI, scientific computing, data processing, and more. Two of the most important applications of CUDA-X libraries are training and inference LLMs, whether for use in everyday consumer applications or highly specialized scientific domains like drug discovery. Multiple CUDA-X libraries are indispensable��

]]> 0 Ashraf Eassa <![CDATA[NVIDIA Blackwell Delivers Massive Performance Leaps in MLPerf Inference v5.0]]> http://www.open-lab.net/blog/?p=98367 2025-04-23T19:41:12Z 2025-04-02T18:14:48Z

The compute demands for large language model (LLM) inference are growing rapidly, fueled by the combination of growing model sizes, real-time latency...]]>

The compute demands for large language model (LLM) inference are growing rapidly, fueled by the combination of growing model sizes, real-time latency...

nvidia-blackwell

The compute demands for large language model (LLM) inference are growing rapidly, fueled by the combination of growing model sizes, real-time latency requirements, and, most recently, AI reasoning. At the same time, as AI adoption grows, the ability of an AI factory to serve as many users as possible, all while maintaining good per-user experiences, is key to maximizing the value it generates.

]]> 0 Elias Wolfberg <![CDATA[AI Model Offers Conservationists New Tools to Protect Fisheries, Wildlife at Scale]]> http://www.open-lab.net/blog/?p=96671 2025-03-06T19:26:37Z 2025-03-03T17:48:01Z

In an effort to rein in illicit fishing, researchers have unveiled a new open-source AI model that can accurately identify what virtually all of the world��s...]]>

In an effort to rein in illicit fishing, researchers have unveiled a new open-source AI model that can accurately identify what virtually all of the world��s...

AI Model Offers Conservationists New Tools to Protect Fisheries, Wildlife at Scale featured

In an effort to rein in illicit fishing, researchers have unveiled a new open-source AI model that can accurately identify what virtually all of the world��s seafaring vessels are doing, including whether a boat is potentially fishing illegally. Seattle-based Ai2 (the Allen Institute for AI) recently released a lightweight model named Atlantes to analyze more than five billion GPS signals a��

]]> 0 Mehran Maghoumi <![CDATA[Build an AI Agent with Expert Reasoning Capabilities Using the DeepSeek-R1 NIM]]> http://www.open-lab.net/blog/?p=96030 2025-03-06T19:52:48Z 2025-02-28T20:23:51Z

AI agents are transforming business operations by automating processes, optimizing decision-making, and streamlining actions. Their effectiveness hinges on...]]>

AI agents are transforming business operations by automating processes, optimizing decision-making, and streamlining actions. Their effectiveness hinges on...

ai-model-representation

AI agents are transforming business operations by automating processes, optimizing decision-making, and streamlining actions. Their effectiveness hinges on expert reasoning, enabling smarter planning and efficient execution. Agentic AI applications could benefit from the capabilities of models such as DeepSeek-R1. Built for solving problems that require advanced AI reasoning��

]]> 0 Sangjune Park <![CDATA[Spotlight: NAVER Place Optimizes SLM-Based Vertical Services with NVIDIA TensorRT-LLM]]> http://www.open-lab.net/blog/?p=96279 2025-04-23T02:32:43Z 2025-02-28T17:57:49Z

NAVER is a popular South Korean search engine company that offers Naver Place, a geo-based service that provides detailed information about millions of...]]>

NAVER is a popular South Korean search engine company that offers Naver Place, a geo-based service that provides detailed information about millions of...

naver-place-app-graphic.

As of March 18, 2025, NVIDIA Triton Inference Server is now part of the NVIDIA Dynamo Platform and has been renamed to NVIDIA Dynamo Triton, accordingly. NAVER is a popular South Korean search engine company that offers Naver Place, a geo-based service that provides detailed information about millions of businesses and points of interest across Korea. Users can search about different places��

]]> 0 John Linford <![CDATA[Spotlight: University of Tokyo Uses NVIDIA Grace Hopper for Groundbreaking Energy-Efficient Seismic Research]]> http://www.open-lab.net/blog/?p=96178 2025-04-23T02:44:05Z 2025-02-20T16:00:00Z

Supercomputers are the engines of groundbreaking discoveries. From predicting extreme weather to advancing disease research and designing safer, more efficient...]]>

Supercomputers are the engines of groundbreaking discoveries. From predicting extreme weather to advancing disease research and designing safer, more efficient...

earth-black-background

Supercomputers are the engines of groundbreaking discoveries. From predicting extreme weather to advancing disease research and designing safer, more efficient infrastructures, these machines simulate complex systems that are impractical to test in the real world due to their size, cost, and material requirements. Since the introduction of the GPU in 1999, NVIDIA has continually pushed the��

]]> 0 Leigh Engel <![CDATA[Simplify System Memory Management with the Latest NVIDIA GH200 NVL2 Enterprise RA]]> http://www.open-lab.net/blog/?p=96079 2025-04-23T02:45:13Z 2025-02-13T21:26:30Z

NVIDIA Enterprise Reference Architectures (Enterprise RAs) can reduce the time and cost of deploying AI infrastructure solutions. They provide a streamlined...]]>

NVIDIA Enterprise Reference Architectures (Enterprise RAs) can reduce the time and cost of deploying AI infrastructure solutions. They provide a streamlined...

nvidia-gh200-nvl2

NVIDIA Enterprise Reference Architectures (Enterprise RAs) can reduce the time and cost of deploying AI infrastructure solutions. They provide a streamlined approach for building flexible and cost-effective accelerated infrastructure while ensuring compatibility and interoperability. The latest Enterprise RA details an optimized cluster configuration for systems integrated with NVIDIA GH200��

]]> 2 Terry Chen <![CDATA[Automating GPU Kernel Generation with DeepSeek-R1 and Inference Time Scaling]]> http://www.open-lab.net/blog/?p=95998 2025-04-23T02:45:39Z 2025-02-12T18:00:00Z

As AI models extend their capabilities to solve more sophisticated challenges, a new scaling law known as test-time scaling or inference-time scaling is...]]>

As AI models extend their capabilities to solve more sophisticated challenges, a new scaling law known as test-time scaling or inference-time scaling is... Mixture of experts icons for attention kernels.

Mixture of experts icons for attention kernels.

As AI models extend their capabilities to solve more sophisticated challenges, a new scaling law known as test-time scaling or inference-time scaling is emerging. Also known as AI reasoning or long-thinking, this technique improves model performance by allocating additional computational resources during inference to evaluate multiple possible outcomes and then selecting the best one��

]]> 2 Ivan Goldwasser <![CDATA[NVIDIA Grace CPU Integrates with the Arm Software Ecosystem]]> http://www.open-lab.net/blog/?p=95638 2025-04-23T02:52:39Z 2025-02-10T18:45:22Z

The NVIDIA Grace CPU is transforming data center design by offering a new level of power-efficient performance. Built specifically for data center scale, the...]]>

The NVIDIA Grace CPU is transforming data center design by offering a new level of power-efficient performance. Built specifically for data center scale, the... Picture of the NVIDIA Grace CPU on a black background.

Picture of the NVIDIA Grace CPU on a black background.

The NVIDIA Grace CPU is transforming data center design by offering a new level of power-efficient performance. Built specifically for data center scale, the Grace CPU is designed to handle demanding workloads while consuming less power. NVIDIA believes in the benefit of leveraging GPUs to accelerate every workload. However, not all workloads are accelerated. This is especially true for those��

]]> 0 Pradeep Ramani <![CDATA[OpenAI Triton on NVIDIA Blackwell Boosts AI Performance and Programmability]]> http://www.open-lab.net/blog/?p=95388 2025-04-23T02:48:06Z 2025-02-05T18:00:00Z

Matrix multiplication and attention mechanisms are the computational backbone of modern AI workloads. While libraries like NVIDIA cuDNN provide highly optimized...]]>

Matrix multiplication and attention mechanisms are the computational backbone of modern AI workloads. While libraries like NVIDIA cuDNN provide highly optimized... Stack diagram for LLM Megatron Core.

Stack diagram for LLM Megatron Core.

Matrix multiplication and attention mechanisms are the computational backbone of modern AI workloads. While libraries like NVIDIA cuDNN provide highly optimized implementations, and frameworks such as CUTLASS offer deep customization, many developers and researchers need a middle ground that combines performance with programmability. The open-source Triton compiler on the NVIDIA Blackwell��

]]> 0 Elias Wolfberg <![CDATA[New AI Model Offers Cellular-Level View of Cancerous Tumors]]> http://www.open-lab.net/blog/?p=95758 2025-04-23T02:48:10Z 2025-02-04T22:33:00Z

Researchers studying cancer unveiled a new AI model that provides cellular-level mapping and visualizations of cancer cells, which scientists hope can shed...]]>

Researchers studying cancer unveiled a new AI model that provides cellular-level mapping and visualizations of cancer cells, which scientists hope can shed...

New AI Model Offers Cellular-Level View of Cancer Tumors

Researchers studying cancer unveiled a new AI model that provides cellular-level mapping and visualizations of cancer cells, which scientists hope can shed light on how��and why��certain inter-cellular relationships triggers cancers to grow. BioTuring, a San Diego-based startup, announced an AI model that can quickly create detailed visualizations of cancerous tumors��at single-cell resolution.

]]> 0 Jonathan Bentz <![CDATA[CUDA Toolkit Now Available for NVIDIA Blackwell?]]> http://www.open-lab.net/blog/?p=95358 2025-04-23T14:58:16Z 2025-01-31T19:17:12Z

The latest release of the CUDA Toolkit, version 12.8, continues to push accelerated computing performance in data sciences, AI, scientific computing, and...]]>

The latest release of the CUDA Toolkit, version 12.8, continues to push accelerated computing performance in data sciences, AI, scientific computing, and...

abstract-graphic

The latest release of the CUDA Toolkit, version 12.8, continues to push accelerated computing performance in data sciences, AI, scientific computing, and computer graphics and simulation, using the latest NVIDIA CPUs and GPUs. This post highlights some of the new features and enhancements included with this release: CUDA Toolkit 12.8 is the first version of the Toolkit to support��

]]> 0 Michelle Horton <![CDATA[Top Posts of 2024 Highlight NVIDIA NIM, LLM Breakthroughs, and Data Science Optimization]]> http://www.open-lab.net/blog/?p=93566 2024-12-16T18:34:16Z 2024-12-16T18:34:14Z

2024 was another landmark year for developers, researchers, and innovators working with NVIDIA technologies. From groundbreaking developments in AI inference to...]]>

2024 was another landmark year for developers, researchers, and innovators working with NVIDIA technologies. From groundbreaking developments in AI inference to...

Technical-Blog-2024-Top-Posts

2024 was another landmark year for developers, researchers, and innovators working with NVIDIA technologies. From groundbreaking developments in AI inference to empowering open-source contributions, these blog posts highlight the breakthroughs that resonated most with our readers. NVIDIA NIM Offers Optimized Inference Microservices for Deploying AI Models at Scale Introduced in��

]]> 0 Leigh Engel <![CDATA[Deploying NVIDIA H200 NVL at Scale with New Enterprise Reference Architecture]]> http://www.open-lab.net/blog/?p=93686 2024-12-12T19:35:14Z 2024-12-12T00:40:45Z

Last month at the Supercomputing 2024 conference, NVIDIA announced the availability of NVIDIA H200 NVL, the latest NVIDIA Hopper platform. Optimized for...]]>

Last month at the Supercomputing 2024 conference, NVIDIA announced the availability of NVIDIA H200 NVL, the latest NVIDIA Hopper platform. Optimized for...

nvidia-h200-nvl

Last month at the Supercomputing 2024 conference, NVIDIA announced the availability of NVIDIA H200 NVL, the latest NVIDIA Hopper platform. Optimized for enterprise workloads, NVIDIA H200 NVL is a versatile platform that delivers accelerated performance for a wide range of AI and HPC applications. With its dual-slot PCIe form-factor and 600W TGP, the H200 NVL enables flexible configuration options��

]]> 0 Ian Pegler <![CDATA[Advancing Ansys Workloads with NVIDIA Grace and NVIDIA Grace Hopper]]> http://www.open-lab.net/blog/?p=92496 2024-12-12T19:38:41Z 2024-11-21T17:30:00Z

Accelerated computing is enabling giant leaps in performance and energy efficiency compared to traditional CPU computing. Delivering these advancements requires...]]>

Accelerated computing is enabling giant leaps in performance and energy efficiency compared to traditional CPU computing. Delivering these advancements requires...

simulation-flow-features-car

Accelerated computing is enabling giant leaps in performance and energy efficiency compared to traditional CPU computing. Delivering these advancements requires full-stack innovation at data-center scale, spanning chips, systems, networking, software, and algorithms. Choosing the right architecture for the right workload with the best energy efficiency is critical to maximizing the performance and��

]]> 0 Sungho Shin <![CDATA[NVIDIA cuDSS Library Removes Barriers to Optimizing the US Power Grid]]> http://www.open-lab.net/blog/?p=92065 2024-11-19T18:26:47Z 2024-11-19T17:00:00Z

In the wake of ever-growing power demands, power systems optimization (PSO) of power grids is crucial for ensuring efficient resource management,...]]>

In the wake of ever-growing power demands, power systems optimization (PSO) of power grids is crucial for ensuring efficient resource management,... Photo of a power line against city lights at twilight.

Photo of a power line against city lights at twilight.

In the wake of ever-growing power demands, power systems optimization (PSO) of power grids is crucial for ensuring efficient resource management, sustainability, and energy security. The Eastern Interconnection, a major North American power grid, consists of approximately 70K nodes (Figure 1). Aside from sheer size, optimizing such a grid is complicated by uncertainties such as catastrophic��

]]> 0 Ashraf Eassa <![CDATA[Llama 3.2 Full-Stack Optimizations Unlock High Performance on NVIDIA GPUs]]> http://www.open-lab.net/blog/?p=90142 2024-11-22T23:11:53Z 2024-11-19T16:00:00Z

Meta recently released its Llama 3.2 series of vision language models (VLMs), which come in 11B parameter and 90B parameter variants. These models are...]]>

Meta recently released its Llama 3.2 series of vision language models (VLMs), which come in 11B parameter and 90B parameter variants. These models are...

three-llamas-holding-number-10-signs

Meta recently released its Llama 3.2 series of vision language models (VLMs), which come in 11B parameter and 90B parameter variants. These models are multimodal, supporting both text and image inputs. In addition, Meta has launched text-only small language model (SLM) variants of Llama 3.2 with 1B and 3B parameters. NVIDIA has optimized the Llama 3.2 collection of models for great performance and��

]]> 0 Rob Nertney <![CDATA[Exploring the Case of Super Protocol with Self-Sovereign AI and NVIDIA Confidential Computing]]> http://www.open-lab.net/blog/?p=91216 2025-06-25T17:52:12Z 2024-11-14T22:01:38Z

Confidential and self-sovereign AI is a new approach to AI development, training, and inference where the user��s data is decentralized, private, and...]]>

Confidential and self-sovereign AI is a new approach to AI development, training, and inference where the user��s data is decentralized, private, and... A cloud with a cybersecurity lock icon, surrounded by a sphere of connected nodes.

A cloud with a cybersecurity lock icon, surrounded by a sphere of connected nodes.

Confidential and self-sovereign AI is a new approach to AI development, training, and inference where the user��s data is decentralized, private, and controlled by the users themselves. This post explores how the capabilities of Confidential Computing (CC) are expanded through decentralization using blockchain technology. The problem being solved is most clearly shown through the use of��

]]> 25 Kazuki Fujii <![CDATA[Developing a 172B LLM with Strong Japanese Capabilities Using NVIDIA Megatron-LM]]> http://www.open-lab.net/blog/?p=91656 2024-11-14T19:32:44Z 2024-11-11T19:50:19Z

Generative AI has the ability to create entirely new content that traditional machine learning (ML) methods struggle to produce. In the field of natural...]]>

Generative AI has the ability to create entirely new content that traditional machine learning (ML) methods struggle to produce. In the field of natural...

nemo-megatron-mini-beat-promo-li-tw-2048x1024 copy

Generative AI has the ability to create entirely new content that traditional machine learning (ML) methods struggle to produce. In the field of natural language processing (NLP), the advent of large language models (LLMs) specifically has led to many innovative and creative AI use cases. These include customer support chatbots, voice assistants, text summarization and translation��

]]> 0 Michelle Horton <![CDATA[Maximizing Energy and Power Efficiency in Applications with NVIDIA GPUs]]> http://www.open-lab.net/blog/?p=90100 2024-10-30T18:55:08Z 2024-10-16T16:50:10Z

As the demand for high-performance computing (HPC) and AI applications grows, so does the importance of energy efficiency. NVIDIA Principal Developer Technology...]]>

As the demand for high-performance computing (HPC) and AI applications grows, so does the importance of energy efficiency. NVIDIA Principal Developer Technology...

nvidia-grace-hopper

As the demand for high-performance computing (HPC) and AI applications grows, so does the importance of energy efficiency. NVIDIA Principal Developer Technology Engineer, Alan Gray, shares insights on optimizing energy and power efficiency for various applications running on the latest NVIDIA technologies, including NVIDIA H100 Tensor Core GPUs and NVIDIA DGX A100 systems. Traditionally��

]]> 0 Nick Comly <![CDATA[Boosting Llama 3.1 405B Throughput by Another 1.5x on NVIDIA H200 Tensor Core GPUs and NVLink Switch]]> http://www.open-lab.net/blog/?p=90040 2024-11-22T23:12:12Z 2024-10-09T15:00:00Z

The continued growth of LLMs capability, fueled by increasing parameter counts and support for longer contexts, has led to their usage in a wide variety of...]]>

The continued growth of LLMs capability, fueled by increasing parameter counts and support for longer contexts, has led to their usage in a wide variety of...

HGX-H200-product-photo-close-up copy 2

The continued growth of LLMs capability, fueled by increasing parameter counts and support for longer contexts, has led to their usage in a wide variety of applications, each with diverse deployment requirements. For example, a chatbot supports a small number of users at very low latencies for good interactivity. Meanwhile, synthetic data generation requires high throughput to process many items��

]]> 1 Nick Comly <![CDATA[Low Latency Inference Chapter 2: Blackwell is Coming. NVIDIA GH200 NVL32 with NVLink Switch Gives Signs of Big Leap in Time to First Token Performance]]> http://www.open-lab.net/blog/?p=88938 2024-11-29T21:06:06Z 2024-09-26T21:44:00Z

Many of the most exciting applications of large language models (LLMs), such as interactive speech bots, coding co-pilots, and search, need to begin responding...]]>

Many of the most exciting applications of large language models (LLMs), such as interactive speech bots, coding co-pilots, and search, need to begin responding...

NVIDIA GH200

Many of the most exciting applications of large language models (LLMs), such as interactive speech bots, coding co-pilots, and search, need to begin responding to user queries quickly to deliver positive user experiences. The time that it takes for an LLM to ingest a user prompt (and context, which can be sizable) and begin outputting a response is called time to first token (TTFT).

]]> 0 Amr Elmeleegy <![CDATA[NVIDIA GH200 Grace Hopper Superchip Delivers Outstanding Performance in MLPerf Inference v4.1]]> http://www.open-lab.net/blog/?p=89401 2024-11-06T02:27:00Z 2024-09-24T16:36:57Z

In the latest round of MLPerf Inference �C a suite of standardized, peer-reviewed inference benchmarks �C the NVIDIA platform delivered outstanding...]]>

In the latest round of MLPerf Inference �C a suite of standardized, peer-reviewed inference benchmarks �C the NVIDIA platform delivered outstanding...

NVIDIA GH200 Grace Hopper Superchip Delivers Outstanding Performance in MLPerf Inference

In the latest round of MLPerf Inference �C a suite of standardized, peer-reviewed inference benchmarks �C the NVIDIA platform delivered outstanding performance across the board. Among the many submissions made using the NVIDIA platform were results using the NVIDIA GH200 Grace Hopper Superchip. GH200 tightly couples an NVIDIA Grace CPU with an NVIDIA Hopper GPU using NVIDIA NVLink-C2C��

]]> 0 Ashraf Eassa <![CDATA[NVIDIA Blackwell Platform Sets New LLM Inference Records in MLPerf Inference v4.1]]> http://www.open-lab.net/blog/?p=87957 2024-09-05T17:57:17Z 2024-08-28T15:00:00Z

Large language model (LLM) inference is a full-stack challenge. Powerful GPUs, high-bandwidth GPU-to-GPU interconnects, efficient acceleration libraries, and a...]]>

Large language model (LLM) inference is a full-stack challenge. Powerful GPUs, high-bandwidth GPU-to-GPU interconnects, efficient acceleration libraries, and a...

nvidia-blackwell

Large language model (LLM) inference is a full-stack challenge. Powerful GPUs, high-bandwidth GPU-to-GPU interconnects, efficient acceleration libraries, and a highly optimized inference engine are required for high-throughput, low-latency inference. MLPerf Inference v4.1 is the latest version of the popular and widely recognized MLPerf Inference benchmarks, developed by the MLCommons��

]]> 1 Amr Elmeleegy <![CDATA[NVIDIA GH200 Superchip Delivers Breakthrough Energy Efficiency and Node Consolidation for Apache Spark]]> http://www.open-lab.net/blog/?p=87567 2024-08-22T18:24:50Z 2024-08-20T20:00:00Z

With the rapid growth of generative AI, CIOs and IT leaders are looking for ways to reclaim data center resources to accommodate new AI use cases that promise...]]>

With the rapid growth of generative AI, CIOs and IT leaders are looking for ways to reclaim data center resources to accommodate new AI use cases that promise...

nvidia-hopper-grace

With the rapid growth of generative AI, CIOs and IT leaders are looking for ways to reclaim data center resources to accommodate new AI use cases that promise greater return on investment without impacting current operations. This is leading IT decision makers to reassess past infrastructure decisions and explore strategies to consolidate traditional workloads into fewer��

]]> 0 Nicolas Dupont <![CDATA[Bringing Confidentiality to Vector Search with Cyborg and NVIDIA cuVS]]> http://www.open-lab.net/blog/?p=87131 2024-10-03T21:17:06Z 2024-08-15T16:00:00Z

In the era of generative AI, vector databases have become indispensable for storing and querying high-dimensional data efficiently. However, like all databases,...]]>

In the era of generative AI, vector databases have become indispensable for storing and querying high-dimensional data efficiently. However, like all databases,...

Bringing Confidentiality to Vector Search with Cyborg and RAPIDS cuVS

In the era of generative AI, vector databases have become indispensable for storing and querying high-dimensional data efficiently. However, like all databases, vector databases are vulnerable to a range of attacks, including cyber threats, phishing attempts, and unauthorized access. This vulnerability is particularly concerning considering that these databases often contain sensitive and��

]]> 0 Ashraf Eassa <![CDATA[Revolutionizing Data Center Efficiency with the NVIDIA Grace Family]]> http://www.open-lab.net/blog/?p=86550 2024-10-09T20:01:54Z 2024-08-02T15:00:00Z

The exponential growth in data processing demand is projected to reach 175 zettabytes by 2025. This contrasts sharply with the slowing pace of CPU performance...]]>

The exponential growth in data processing demand is projected to reach 175 zettabytes by 2025. This contrasts sharply with the slowing pace of CPU performance... NVIDIA Hopper GPU and NVIDIA Grace CPUs on a black background.

NVIDIA Hopper GPU and NVIDIA Grace CPUs on a black background.

The exponential growth in data processing demand is projected to reach 175 zettabytes by 2025. This contrasts sharply with the slowing pace of CPU performance improvements. For more than a decade, semiconductor advancements have not kept up with the pace predicted by Moore��s Law, leading to a pressing need for more efficient computing solutions. NVIDIA GPUs have emerged as the most efficient��

]]> 0 Ashraf Eassa <![CDATA[Achieving High Mixtral 8x7B Performance with NVIDIA H100 Tensor Core GPUs and NVIDIA TensorRT-LLM]]> http://www.open-lab.net/blog/?p=84749 2024-08-07T23:50:14Z 2024-07-02T18:00:00Z

As large language models (LLMs) continue to grow in size and complexity, the performance requirements for serving them quickly and cost-effectively continue to...]]>

As large language models (LLMs) continue to grow in size and complexity, the performance requirements for serving them quickly and cost-effectively continue to...

Achieving High Mixtral 8x7B Performance with NVIDIA H100 Tensor Core GPUs and TensorRT-LLM

As large language models (LLMs) continue to grow in size and complexity, the performance requirements for serving them quickly and cost-effectively continue to grow. Delivering high LLM inference performance requires an efficient parallel computing architecture and a flexible and highly optimized software stack. Recently, NVIDIA Hopper GPUs running NVIDIA TensorRT-LLM inference software set��

]]> 0 Michelle Horton <![CDATA[How Cutting-Edge Computer Chips are Speeding Up the AI Revolution]]> http://www.open-lab.net/blog/?p=84802 2024-08-06T23:50:20Z 2024-07-01T21:27:31Z

Featured in Nature, this post delves into how GPUs and other advanced technologies are meeting the computational challenges posed by AI.]]>

Featured in Nature, this post delves into how GPUs and other advanced technologies are meeting the computational challenges posed by AI. Image of NVIDIA GB200.

Image of NVIDIA GB200.

Featured in Nature, this post delves into how GPUs and other advanced technologies are meeting the computational challenges posed by AI.

]]> 0 Babak Hejazi <![CDATA[Introducing Grouped GEMM APIs in cuBLAS and More Performance Updates]]> http://www.open-lab.net/blog/?p=83888 2024-07-16T17:19:07Z 2024-06-12T20:30:00Z

The latest release of NVIDIA cuBLAS library, version 12.5, continues to deliver functionality and performance to deep learning (DL) and high-performance...]]>

The latest release of NVIDIA cuBLAS library, version 12.5, continues to deliver functionality and performance to deep learning (DL) and high-performance...

cublas-compilation

The latest release of NVIDIA cuBLAS library, version 12.5, continues to deliver functionality and performance to deep learning (DL) and high-performance computing (HPC) workloads. This post provides an overview of the following updates on cuBLAS matrix multiplications (matmuls) since version 12.0, and a walkthrough: Grouped GEMM APIs can be viewed as a generalization of the batched��

]]> 0 Ashraf Eassa <![CDATA[NVIDIA Sets New Generative AI Performance and Scale Records in MLPerf Training v4.0]]> http://www.open-lab.net/blog/?p=83776 2024-06-27T18:18:05Z 2024-06-12T15:00:00Z

Generative AI models have a variety of uses, such as helping write computer code, crafting stories, composing music, generating images, producing videos, and...]]>

Generative AI models have a variety of uses, such as helping write computer code, crafting stories, composing music, generating images, producing videos, and... Decorative image of rows of GPUs.

Decorative image of rows of GPUs.

Generative AI models have a variety of uses, such as helping write computer code, crafting stories, composing music, generating images, producing videos, and more. And, as these models continue to grow in size and are trained on even more data, they are producing even higher-quality outputs. Building and deploying these more intelligent models is incredibly compute-intensive��

]]> 0 Rob Nertney <![CDATA[Announcing Confidential Computing General Access on NVIDIA H100 Tensor Core GPUs]]> http://www.open-lab.net/blog/?p=81376 2024-06-06T14:47:37Z 2024-04-25T17:00:00Z

NVIDIA launched the initial release of the Confidential Computing (CC) solution in private preview for early access in July 2023 through NVIDIA LaunchPad....]]>

NVIDIA launched the initial release of the Confidential Computing (CC) solution in private preview for early access in July 2023 through NVIDIA LaunchPad....

person-holding-laptop-lock-overlay

NVIDIA launched the initial release of the Confidential Computing (CC) solution in private preview for early access in July 2023 through NVIDIA LaunchPad. Confidential Computing can be used in virtualized environments and provides the highest level of security with the best performance possible in the industry today. The?NVIDIA H100 Tensor Core GPU was the first GPU to introduce support for CC.

]]> 1 Ashraf Eassa <![CDATA[NVIDIA H200 Tensor Core GPUs and NVIDIA TensorRT-LLM Set MLPerf LLM Inference Records]]> http://www.open-lab.net/blog/?p=80197 2024-11-14T15:53:12Z 2024-03-27T15:29:05Z

Generative AI is unlocking new computing applications that greatly augment human capability, enabled by continued model innovation. Generative AI...]]>

Generative AI is unlocking new computing applications that greatly augment human capability, enabled by continued model innovation. Generative AI... An image of an NVIDIA H200 Tensor Core GPU.

An image of an NVIDIA H200 Tensor Core GPU.

Generative AI is unlocking new computing applications that greatly augment human capability, enabled by continued model innovation. Generative AI models��including large language models (LLMs)��are used for crafting marketing copy, writing computer code, rendering detailed images, composing music, generating videos, and more. The amount of compute required by the latest models is immense and��

]]> 0 Robert Jensen <![CDATA[Building High-Performance Applications in the Era of Accelerated Computing]]> http://www.open-lab.net/blog/?p=80067 2024-08-28T17:32:20Z 2024-03-25T16:00:00Z

AI is augmenting high-performance computing (HPC) with novel approaches to data processing, simulation, and modeling. Because of the computational requirements...]]>

AI is augmenting high-performance computing (HPC) with novel approaches to data processing, simulation, and modeling. Because of the computational requirements... Illustration representing HPC.

Illustration representing HPC.

AI is augmenting high-performance computing (HPC) with novel approaches to data processing, simulation, and modeling. Because of the computational requirements of these new AI workloads, HPC is scaling up at a rapid pace. To enable applications to scale to multi-GPU and multi-node platforms, HPC tools and libraries must support that growth. NVIDIA provides a comprehensive ecosystem of��

]]> 0 Ioana Boier <![CDATA[How to Accelerate Quantitative Finance with ISO C++ Standard Parallelism]]> http://www.open-lab.net/blog/?p=78691 2024-04-09T23:45:35Z 2024-03-06T19:00:00Z

Quantitative finance libraries are software packages that consist of mathematical, statistical, and, more recently, machine learning models designed for use in...]]>

Quantitative finance libraries are software packages that consist of mathematical, statistical, and, more recently, machine learning models designed for use in...

graph-grid-background

Quantitative finance libraries are software packages that consist of mathematical, statistical, and, more recently, machine learning models designed for use in quantitative investment contexts. They contain a wide range of functionalities, often proprietary, to support the valuation, risk management, construction, and optimization of investment portfolios. Financial firms that develop such��

]]> 1 Rohil Bhargava <![CDATA[Deploying Retrieval-Augmented Generation Applications on NVIDIA GH200 Delivers Accelerated Performance]]> http://www.open-lab.net/blog/?p=74632 2024-09-22T15:11:34Z 2023-12-18T17:00:00Z

Large language model (LLM) applications are essential in enhancing productivity across industries through natural language. However, their effectiveness is...]]>

Large language model (LLM) applications are essential in enhancing productivity across industries through natural language. However, their effectiveness is...

nvidia-grace-hopper

Large language model (LLM) applications are essential in enhancing productivity across industries through natural language. However, their effectiveness is often limited by the extent of their training data, resulting in poor performance when dealing with real-time events and new knowledge the LLM isn��t trained on. Retrieval-augmented generation (RAG) solves these problems.

]]> 3 Dave Salvator <![CDATA[Achieving Top Inference Performance with the NVIDIA H100 Tensor Core GPU and NVIDIA TensorRT-LLM]]> http://www.open-lab.net/blog/?p=75194 2023-12-14T23:33:04Z 2023-12-14T19:59:00Z

Best-in-class AI performance requires an efficient parallel computing architecture, a productive tool stack, and deeply optimized algorithms. NVIDIA released...]]>

Best-in-class AI performance requires an efficient parallel computing architecture, a productive tool stack, and deeply optimized algorithms. NVIDIA released... An illustration of the NVIDIA H100.

An illustration of the NVIDIA H100.

Best-in-class AI performance requires an efficient parallel computing architecture, a productive tool stack, and deeply optimized algorithms. NVIDIA released the open-source NVIDIA TensorRT-LLM, which includes the latest kernel optimizations for the NVIDIA Hopper architecture at the heart of the NVIDIA H100 Tensor Core GPU. These optimizations enable models like Llama 2 70B to execute using��

]]> 1 Ashraf Eassa <![CDATA[NVIDIA TensorRT-LLM Enhancements Deliver Massive Large Language Model Speedups on NVIDIA H200]]> http://www.open-lab.net/blog/?p=74771 2023-12-14T19:27:30Z 2023-12-05T01:11:43Z

Large language models (LLMs) have seen dramatic growth over the last year, and the challenge of delivering great user experiences depends on both high-compute...]]>

Large language models (LLMs) have seen dramatic growth over the last year, and the challenge of delivering great user experiences depends on both high-compute... An illustration showing the steps

An illustration showing the steps

Large language models (LLMs) have seen dramatic growth over the last year, and the challenge of delivering great user experiences depends on both high-compute throughput as well as large amounts of high-bandwidth memory. NVIDIA TensorRT-LLM provides optimizations for both peak throughput and memory optimization, delivering massive improvements in LLM inference performance.

]]> 0 Ashraf Eassa <![CDATA[New NVIDIA NeMo Framework Features and NVIDIA H200 Supercharge LLM Training Performance and Versatility]]> http://www.open-lab.net/blog/?p=74615 2024-04-15T19:49:44Z 2023-12-04T18:00:00Z

The rapid growth in the size, complexity, and diversity of large language models (LLMs) continues to drive an insatiable need for AI training performance....]]>

The rapid growth in the size, complexity, and diversity of large language models (LLMs) continues to drive an insatiable need for AI training performance.... Illustration representing NeMo Framework.

Illustration representing NeMo Framework.

The rapid growth in the size, complexity, and diversity of large language models (LLMs) continues to drive an insatiable need for AI training performance. Delivering top performance requires the ability to train models at the scale of an entire data center efficiently. This is achieved through exceptional craftsmanship at every layer of the technology stack, spanning chips, systems, and software.

]]> 0 Harry Petty <![CDATA[One Giant Superchip for LLMs, Recommenders, and GNNs: Introducing NVIDIA GH200 NVL32]]> http://www.open-lab.net/blog/?p=74208 2023-12-14T19:27:37Z 2023-11-28T18:19:07Z

At AWS re:Invent 2023, AWS and NVIDIA announced that AWS will be the first cloud provider to offer NVIDIA GH200 Grace Hopper Superchips interconnected with...]]>

At AWS re:Invent 2023, AWS and NVIDIA announced that AWS will be the first cloud provider to offer NVIDIA GH200 Grace Hopper Superchips interconnected with...

At AWS re:Invent 2023, AWS and NVIDIA announced that AWS will be the first cloud provider to offer NVIDIA GH200 Grace Hopper Superchips interconnected with NVIDIA NVLink technology through NVIDIA DGX Cloud and running on Amazon Elastic Compute Cloud (Amazon EC2). This is a game-changing technology for cloud computing. The NVIDIA GH200 NVL32, a rack-scale solution within NVIDIA DGX Cloud or an��

]]> 0 Graham Lopez <![CDATA[Unlock the Power of NVIDIA Grace and NVIDIA Hopper Architectures with Foundational HPC Software]]> http://www.open-lab.net/blog/?p=72977 2024-08-28T17:33:20Z 2023-11-16T19:07:51Z

High-performance computing (HPC) powers applications in simulation and modeling, healthcare and life sciences, industry and engineering, and more. In the modern...]]>

High-performance computing (HPC) powers applications in simulation and modeling, healthcare and life sciences, industry and engineering, and more. In the modern... An illustration representing HPC applications.

An illustration representing HPC applications.

High-performance computing (HPC) powers applications in simulation and modeling, healthcare and life sciences, industry and engineering, and more. In the modern data center, HPC synergizes with AI, harnessing data in transformative new ways. The performance and throughput demands of next-generation HPC applications call for an accelerated computing platform that can handle diverse workloads��

]]> 0 Graham Lopez <![CDATA[Simplifying GPU Programming for HPC with NVIDIA Grace Hopper Superchip]]> http://www.open-lab.net/blog/?p=72720 2023-11-16T19:16:39Z 2023-11-13T17:13:02Z

The new hardware developments in NVIDIA Grace Hopper Superchip systems enable some dramatic changes to the way developers approach GPU programming. Most...]]>

The new hardware developments in NVIDIA Grace Hopper Superchip systems enable some dramatic changes to the way developers approach GPU programming. Most...

nvidia-grace-hopper

The new hardware developments in NVIDIA Grace Hopper Superchip systems enable some dramatic changes to the way developers approach GPU programming. Most notably, the bidirectional, high-bandwidth, and cache-coherent connection between CPU and GPU memory means that the user can develop their application for both processors while using a single, unified address space.

]]> 1 Ashraf Eassa <![CDATA[Setting New Records at Data Center Scale Using NVIDIA H100 GPUs and NVIDIA Quantum-2 InfiniBand]]> http://www.open-lab.net/blog/?p=72467 2023-11-24T18:36:30Z 2023-11-08T17:00:00Z

Generative AI is rapidly transforming computing, unlocking new use cases and turbocharging existing ones. Large language models (LLMs), such as OpenAI��s GPT...]]>

Generative AI is rapidly transforming computing, unlocking new use cases and turbocharging existing ones. Large language models (LLMs), such as OpenAI��s GPT...

hpc-mlperf-training-graphic

Generative AI is rapidly transforming computing, unlocking new use cases and turbocharging existing ones. Large language models (LLMs), such as OpenAI��s GPT models and Meta��s Llama 2, skillfully perform a variety of tasks on text-based content. These tasks include summarization, translation, classification, and generation of new content such as computer code, marketing copy, poetry, and much more.

]]> 0 Rob Armstrong <![CDATA[CUDA Toolkit 12.3 Delivers New Features for Accelerated Computing]]> http://www.open-lab.net/blog/?p=71735 2024-08-28T17:33:55Z 2023-11-01T16:00:00Z

The latest release of CUDA Toolkit continues to push the envelope of accelerated computing performance using the latest NVIDIA GPUs. New features of this...]]>

The latest release of CUDA Toolkit continues to push the envelope of accelerated computing performance using the latest NVIDIA GPUs. New features of this...

cuda-graphic

The latest release of CUDA Toolkit continues to push the envelope of accelerated computing performance using the latest NVIDIA GPUs. New features of this release, version 12.3, include: CUDA and the CUDA Toolkit continue to provide the foundation for all accelerated computing applications in data science, machine learning and deep learning, generative AI with LLMs for both training and��

]]> 0 Emeka Obiodu <![CDATA[Enabling the World��s First GPU-Accelerated 5G Open RAN for NTT DOCOMO with NVIDIA Aerial]]> http://www.open-lab.net/blog/?p=71099 2023-11-14T18:58:58Z 2023-09-27T00:00:00Z

NVIDIA, working with Fujitsu and Wind River, has enabled NTT DOCOMO to launch the first GPU-accelerated commercial Open RAN 5G service in its network in...]]>

NVIDIA, working with Fujitsu and Wind River, has enabled NTT DOCOMO to launch the first GPU-accelerated commercial Open RAN 5G service in its network in... Decorative image of a telco network as beams of light on a city street.

Decorative image of a telco network as beams of light on a city street.

NVIDIA, working with Fujitsu and Wind River, has enabled NTT DOCOMO to launch the first GPU-accelerated commercial Open RAN 5G service in its network in Japan. This makes it the first-ever telco in the world to deploy a GPU-accelerated commercial 5G network. The announcement is a major milestone as the telecom industry strives to address the multi-billion-dollar problem of driving��

]]> 1 Ashraf Eassa <![CDATA[Leading MLPerf Inference v3.1 Results with NVIDIA GH200 Grace Hopper Superchip Debut]]> http://www.open-lab.net/blog/?p=70450 2023-09-22T16:17:33Z 2023-09-09T16:00:00Z

AI is transforming computing, and inference is how the capabilities of AI are deployed in the world��s applications. Intelligent chatbots, image and video...]]>

AI is transforming computing, and inference is how the capabilities of AI are deployed in the world��s applications. Intelligent chatbots, image and video... NVIDIA Jetson Orin modules.

NVIDIA Jetson Orin modules.

AI is transforming computing, and inference is how the capabilities of AI are deployed in the world��s applications. Intelligent chatbots, image and video synthesis from simple text prompts, personalized content recommendations, and medical imaging are just a few examples of AI-powered applications. Inference workloads are both computationally demanding and diverse, requiring that platforms be��

]]> 1 John Hubbard <![CDATA[Simplifying GPU Application Development with Heterogeneous Memory Management]]> http://www.open-lab.net/blog/?p=69542 2023-09-13T17:07:34Z 2023-08-22T17:00:00Z

Heterogeneous Memory Management (HMM) is a CUDA memory management feature that extends the simplicity and productivity of the CUDA Unified Memory programming...]]>

Heterogeneous Memory Management (HMM) is a CUDA memory management feature that extends the simplicity and productivity of the CUDA Unified Memory programming...

globe-regions-in-color

]]> 0 Emily Apsey <![CDATA[Confidential Computing on NVIDIA H100 GPUs for Secure and Trustworthy AI]]> http://www.open-lab.net/blog/?p=68661 2023-08-24T18:03:52Z 2023-08-03T22:32:37Z

Hardware virtualization is an effective way to isolate workloads in virtual machines (VMs) from the physical hardware and from each other. This offers improved...]]>

Hardware virtualization is an effective way to isolate workloads in virtual machines (VMs) from the physical hardware and from each other. This offers improved...

confidential-computing-featured

Hardware virtualization is an effective way to isolate workloads in virtual machines (VMs) from the physical hardware and from each other. This offers improved security, particularly in a multi-tenant environment. Yet, security risks such as in-band attacks, side-channel attacks, and physical attacks can still happen, compromising the confidentiality, integrity, or availability of your data and��

]]> 1 Pradyumna Desale <![CDATA[Announcing NVIDIA DGX GH200: The First 100 Terabyte GPU Memory System]]> http://www.open-lab.net/blog/?p=65526 2023-12-06T22:09:47Z 2023-05-29T03:30:00Z

At COMPUTEX 2023, NVIDIA announced the NVIDIA DGX GH200, which marks another breakthrough in GPU-accelerated computing to power the most demanding giant AI...]]>

At COMPUTEX 2023, NVIDIA announced the NVIDIA DGX GH200, which marks another breakthrough in GPU-accelerated computing to power the most demanding giant AI...

nvidia-dgx-gh200

At COMPUTEX 2023, NVIDIA announced the NVIDIA DGX GH200, which marks another breakthrough in GPU-accelerated computing to power the most demanding giant AI workloads. In addition to describing critical aspects of the NVIDIA DGX GH200 architecture, this post discusses how NVIDIA Base Command enables rapid deployment, accelerates the onboarding of users, and simplifies system management.

]]> 0 Ashraf Eassa <![CDATA[Setting New Records in MLPerf Inference v3.0 with Full-Stack Optimizations for AI]]> http://www.open-lab.net/blog/?p=62958 2023-07-05T19:23:50Z 2023-04-05T19:10:55Z

The most exciting computing applications currently rely on training and running inference on complex AI models, often in demanding, real-time deployment...]]>

The most exciting computing applications currently rely on training and running inference on complex AI models, often in demanding, real-time deployment...

hpc-mlperf-inference-v3.0

The most exciting computing applications currently rely on training and running inference on complex AI models, often in demanding, real-time deployment scenarios. High-performance, accelerated AI platforms are needed to meet the demands of these applications and deliver the best user experiences. New AI models are constantly being invented to enable new capabilities��

]]> 0 Rick Merritt <![CDATA[Explainer: What Is Confidential Computing?]]> http://www.open-lab.net/blog/?p=61586 2024-06-05T22:13:41Z 2023-03-08T20:00:00Z

Confidential computing is a way of processing data in a protected zone of a computer��s processor, often inside a remote edge or public cloud server, and...]]>

Confidential computing is a way of processing data in a protected zone of a computer��s processor, often inside a remote edge or public cloud server, and...

chess-pixabay-x-1280

Confidential computing is a way of processing data in a protected zone of a computer��s processor, often inside a remote edge or public cloud server, and proving that no one viewed or altered the work.

]]> 0 Roman Dubtsov <![CDATA[New cuBLAS 12.0 Features and Matrix Multiplication Performance on NVIDIA Hopper GPUs]]> http://www.open-lab.net/blog/?p=60111 2023-02-23T18:21:10Z 2023-02-01T18:30:00Z

The NVIDIA H100 Tensor Core GPU, based on the NVIDIA Hopper architecture with the fourth generation of NVIDIA Tensor Cores, recently debuted delivering...]]>

The NVIDIA H100 Tensor Core GPU, based on the NVIDIA Hopper architecture with the fourth generation of NVIDIA Tensor Cores, recently debuted delivering... GPU, cell phone, woman on monitor

GPU, cell phone, woman on monitor

The NVIDIA H100 Tensor Core GPU, based on the NVIDIA Hopper architecture with the fourth generation of NVIDIA Tensor Cores, recently debuted delivering unprecedented performance and sweeping AI benchmarks such as MLPerf training. A significant fraction of operations in AI and machine learning benchmarks are general matrix multiplications (GEMMS), which are also referred to as matmul��

]]> 0 Fred Oh <![CDATA[CUDA 12.0: New Features and Beyond on YouTube Premiere]]> http://www.open-lab.net/blog/?p=58925 2023-06-12T08:19:35Z 2022-12-14T01:09:08Z

Learn about the newest CUDA features such as release compatibility, dynamic parallelism, lazy module loading, and support for the new NVIDIA Hopper and NVIDIA...]]>

Learn about the newest CUDA features such as release compatibility, dynamic parallelism, lazy module loading, and support for the new NVIDIA Hopper and NVIDIA...

CUDA-Youtube

Learn about the newest CUDA features such as release compatibility, dynamic parallelism, lazy module loading, and support for the new NVIDIA Hopper and NVIDIA Ada Lovelace GPU architectures.

]]> 0 Rob Armstrong <![CDATA[CUDA Toolkit 12.0 Released for General Availability]]> http://www.open-lab.net/blog/?p=58508 2024-08-28T17:43:25Z 2022-12-12T19:00:00Z

NVIDIA announces the newest CUDA Toolkit software release, 12.0. This release is the first major release in many years and it focuses on new programming models...]]>

NVIDIA announces the newest CUDA Toolkit software release, 12.0. This release is the first major release in many years and it focuses on new programming models...

cuda-image-16x9-1

NVIDIA announces the newest CUDA Toolkit software release, 12.0. This release is the first major release in many years and it focuses on new programming models and CUDA application acceleration through new hardware capabilities. For more information, watch the YouTube Premiere webinar, CUDA 12.0: New Features and Beyond. You can now target architecture-specific features and instructions��

]]> 0 Ajay Tirumala <![CDATA[Boosting Dynamic Programming Performance Using NVIDIA Hopper GPU DPX Instructions]]> http://www.open-lab.net/blog/?p=57917 2023-03-21T16:43:06Z 2022-12-08T17:00:00Z

Dynamic programming (DP) is a well-known algorithmic technique and a mathematical optimization that has been used for several decades to solve groundbreaking...]]>

Dynamic programming (DP) is a well-known algorithmic technique and a mathematical optimization that has been used for several decades to solve groundbreaking...

hopper-dpx-featured

Dynamic programming (DP) is a well-known algorithmic technique and a mathematical optimization that has been used for several decades to solve groundbreaking problems in computer science. An example DP use case is route optimization with hundreds or thousands of constraints or weights using the Floyd-Warshall all-pair shortest paths algorithm. Another use case is the alignment of reads for��

]]> 0 Rick Merritt <![CDATA[Explainer: What Is an Exaflop?]]> http://www.open-lab.net/blog/?p=52686 2024-06-05T21:43:08Z 2022-09-14T19:00:00Z

An exaflop is a measure of performance for a supercomputer that can calculate at least one quintillion floating point operations per second.]]>

An exaflop is a measure of performance for a supercomputer that can calculate at least one quintillion floating point operations per second.

exaflop-tachometer-pixabay-1280x680

An exaflop is a measure of performance for a supercomputer that can calculate at least one quintillion floating point operations per second.

]]> 0 Rajarshi Roy <![CDATA[Designing Arithmetic Circuits with Deep Reinforcement Learning]]> http://www.open-lab.net/blog/?p=49611 2022-07-11T20:49:10Z 2022-07-08T14:38:33Z

As Moore��s law slows down, it becomes increasingly important to develop other techniques that improve the performance of a chip at the same technology process...]]>

As Moore��s law slows down, it becomes increasingly important to develop other techniques that improve the performance of a chip at the same technology process...

NVIDIA researchers use AI to design better arithmetic circuits that power our AI chips.

As Moore��s law slows down, it becomes increasingly important to develop other techniques that improve the performance of a chip at the same technology process node. Our approach uses AI to design smaller, faster, and more efficient circuits to deliver more performance with each chip generation. Vast arrays of arithmetic circuits have powered NVIDIA GPUs to achieve unprecedented acceleration��

]]> 0 Michael Andersch <![CDATA[NVIDIA Hopper Architecture In-Depth]]> http://www.open-lab.net/blog/?p=45555 2023-10-25T23:51:26Z 2022-03-22T18:00:00Z

Today during the 2022 NVIDIA GTC Keynote address, NVIDIA CEO Jensen Huang introduced the new NVIDIA H100 Tensor Core GPU based on the new NVIDIA Hopper GPU...]]>

Today during the 2022 NVIDIA GTC Keynote address, NVIDIA CEO Jensen Huang introduced the new NVIDIA H100 Tensor Core GPU based on the new NVIDIA Hopper GPU...

GTC2022_SXM5_01_v001_DL

Today during the 2022 NVIDIA GTC Keynote address, NVIDIA CEO Jensen Huang introduced the new NVIDIA H100 Tensor Core GPU based on the new NVIDIA Hopper GPU architecture. This post gives you a look inside the new H100 GPU and describes important new features of NVIDIA Hopper architecture GPUs. The NVIDIA H100 Tensor Core GPU is our ninth-generation data center GPU designed to deliver an��

]]> 2 ��˳��97caoporen��