Ashraf Eassa – NVIDIA Technical Blog

Ashraf Eassa – NVIDIA Technical Blog News and tutorials for developers, data scientists, and IT admins 2025-06-18T16:24:52Z http://www.open-lab.net/blog/feed/ Ashraf Eassa <![CDATA[NVIDIA Blackwell Delivers up to 2.6x Higher Performance in MLPerf Training v5.0]]> http://www.open-lab.net/blog/?p=101269 2025-06-18T16:24:52Z 2025-06-04T17:26:38Z

The journey to create a state-of-the-art large language model (LLM) begins with a process called pretraining. Pretraining a state-of-the-art model is...]]>

The journey to create a state-of-the-art large language model (LLM) begins with a process called pretraining. Pretraining a state-of-the-art model is computationally demanding, with popular open-weights models featuring tens to hundreds of billions parameters and trained using trillions of tokens. As model intelligence grows with increasing model parameter count and training dataset size…

]]> Ashraf Eassa <![CDATA[NVIDIA Blackwell Delivers Massive Performance Leaps in MLPerf Inference v5.0]]> http://www.open-lab.net/blog/?p=98367 2025-04-23T19:41:12Z 2025-04-02T18:14:48Z

The compute demands for large language model (LLM) inference are growing rapidly, fueled by the combination of growing model sizes, real-time latency...]]>

The compute demands for large language model (LLM) inference are growing rapidly, fueled by the combination of growing model sizes, real-time latency requirements, and, most recently, AI reasoning. At the same time, as AI adoption grows, the ability of an AI factory to serve as many users as possible, all while maintaining good per-user experiences, is key to maximizing the value it generates.

]]> Ashraf Eassa <![CDATA[NVIDIA Blackwell Delivers World-Record DeepSeek-R1 Inference Performance]]> http://www.open-lab.net/blog/?p=97352 2025-04-23T00:23:25Z 2025-03-18T17:41:42Z

NVIDIA announced world-record DeepSeek-R1 inference performance at NVIDIA GTC 2025. A single NVIDIA DGX system with eight NVIDIA Blackwell GPUs can achieve over...]]>

NVIDIA announced world-record DeepSeek-R1 inference performance at NVIDIA GTC 2025. A single NVIDIA DGX system with eight NVIDIA Blackwell GPUs can achieve over 250 tokens per second per user or a maximum throughput of over 30,000 tokens per second on the massive, state-of-the-art 671 billion parameter DeepSeek-R1 model. These rapid advancements in performance at both ends of the performance…

]]> 1 Ashraf Eassa <![CDATA[Optimize AI Inference Performance with NVIDIA Full-Stack Solutions]]> http://www.open-lab.net/blog/?p=95310 2025-05-30T00:55:04Z 2025-01-24T16:00:00Z

The explosion of AI-driven applications has placed unprecedented demands on both developers, who must balance delivering cutting-edge performance with managing...]]>

As of March 18, 2025, NVIDIA Triton Inference Server is now part of the NVIDIA Dynamo Platform and has been renamed to NVIDIA Dynamo Triton, accordingly. The explosion of AI-driven applications has placed unprecedented demands on both developers, who must balance delivering cutting-edge performance with managing operational complexity and cost, and AI infrastructure.

]]> Ashraf Eassa <![CDATA[Boost Llama 3.3 70B Inference Throughput 3x with NVIDIA TensorRT-LLM Speculative Decoding]]> http://www.open-lab.net/blog/?p=94146 2024-12-19T23:03:40Z 2024-12-17T17:00:00Z

Meta's Llama collection of open large language models (LLMs) continues to grow with the recent addition of Llama 3.3 70B, a text-only...]]>

Meta’s Llama collection of open large language models (LLMs) continues to grow with the recent addition of Llama 3.3 70B, a text-only instruction-tuned model. Llama 3.3 provides enhanced performance respective to the older Llama 3.1 70B model and can even match the capabilities of the larger, more computationally expensive Llama 3.1 405B model on several tasks including math, reasoning, coding…

]]> 2 Ashraf Eassa <![CDATA[Llama 3.2 Full-Stack Optimizations Unlock High Performance on NVIDIA GPUs]]> http://www.open-lab.net/blog/?p=90142 2024-11-22T23:11:53Z 2024-11-19T16:00:00Z

Meta recently released its Llama 3.2 series of vision language models (VLMs), which come in 11B parameter and 90B parameter variants. These models are...]]>

Meta recently released its Llama 3.2 series of vision language models (VLMs), which come in 11B parameter and 90B parameter variants. These models are multimodal, supporting both text and image inputs. In addition, Meta has launched text-only small language model (SLM) variants of Llama 3.2 with 1B and 3B parameters. NVIDIA has optimized the Llama 3.2 collection of models for great performance and…

]]> Ashraf Eassa <![CDATA[NVIDIA Blackwell Doubles LLM Training Performance in MLPerf Training v4.1]]> http://www.open-lab.net/blog/?p=91807 2024-11-14T17:10:37Z 2024-11-13T16:00:00Z

As models grow larger and are trained on more data, they become more capable, making them more useful. To train these models quickly, more performance,...]]>

As models grow larger and are trained on more data, they become more capable, making them more useful. To train these models quickly, more performance, delivered at data center scale, is required. The NVIDIA Blackwell platform, launched at GTC 2024 and now in full production, integrates seven types of chips: GPU, CPU, DPU, NVLink Switch chip, InfiniBand Switch, and Ethernet Switch.

]]> Ashraf Eassa <![CDATA[3x Faster AllReduce with NVSwitch and TensorRT-LLM MultiShot]]> http://www.open-lab.net/blog/?p=91412 2025-05-01T18:34:34Z 2024-11-01T22:00:36Z

Deploying generative AI workloads in production environments where user numbers can fluctuate from hundreds to hundreds of thousands �C and where input...]]>

Deploying generative AI workloads in production environments where user numbers can fluctuate from hundreds to hundreds of thousands – and where input sequence lengths differ with each request – poses unique challenges. To achieve low latency inference in these environments, multi-GPU setups are a must – irrespective of the GPU generation or its memory capacity. To enhance inference performance in…

]]> 1 Ashraf Eassa <![CDATA[NVIDIA GH200 Superchip Accelerates Inference by 2x in Multiturn Interactions with Llama Models]]> http://www.open-lab.net/blog/?p=90897 2024-11-06T02:24:56Z 2024-10-28T15:00:00Z

Deploying large language models (LLMs) in production environments often requires making hard trade-offs between enhancing user interactivity and increasing...]]>

Deploying large language models (LLMs) in production environments often requires making hard trade-offs between enhancing user interactivity and increasing system throughput. While enhancing user interactivity requires minimizing time to first token (TTFT), increasing throughput requires increasing tokens per second. Improving one aspect often results in the decline of the other…

]]> 1 Ashraf Eassa <![CDATA[NVIDIA Grace CPU Delivers World-Class Data Center Performance and Breakthrough Energy Efficiency]]> http://www.open-lab.net/blog/?p=90087 2024-11-06T02:26:22Z 2024-10-09T19:00:00Z

NVIDIA designed the NVIDIA Grace CPU to be a new kind of high-performance, data center CPU��one built to deliver breakthrough energy efficiency and optimized...]]>

NVIDIA designed the NVIDIA Grace CPU to be a new kind of high-performance, data center CPU—one built to deliver breakthrough energy efficiency and optimized for performance at data center scale. Accelerated computing is enabling giant leaps in performance and energy efficiency compared to traditional CPU computing. To deliver these speedups, full-stack innovation at data center scale is…

]]> Ashraf Eassa <![CDATA[Boosting Llama 3.1 405B Throughput by Another 1.5x on NVIDIA H200 Tensor Core GPUs and NVLink Switch]]> http://www.open-lab.net/blog/?p=90040 2024-11-22T23:12:12Z 2024-10-09T15:00:00Z

The continued growth of LLMs capability, fueled by increasing parameter counts and support for longer contexts, has led to their usage in a wide variety of...]]>

The continued growth of LLMs capability, fueled by increasing parameter counts and support for longer contexts, has led to their usage in a wide variety of applications, each with diverse deployment requirements. For example, a chatbot supports a small number of users at very low latencies for good interactivity. Meanwhile, synthetic data generation requires high throughput to process many items…

]]> 1 Ashraf Eassa <![CDATA[Low Latency Inference Chapter 2: Blackwell is Coming. NVIDIA GH200 NVL32 with NVLink Switch Gives Signs of Big Leap in Time to First Token Performance]]> http://www.open-lab.net/blog/?p=88938 2024-11-29T21:06:06Z 2024-09-26T21:44:00Z

Many of the most exciting applications of large language models (LLMs), such as interactive speech bots, coding co-pilots, and search, need to begin responding...]]>

Many of the most exciting applications of large language models (LLMs), such as interactive speech bots, coding co-pilots, and search, need to begin responding to user queries quickly to deliver positive user experiences. The time that it takes for an LLM to ingest a user prompt (and context, which can be sizable) and begin outputting a response is called time to first token (TTFT).

]]> Ashraf Eassa <![CDATA[NVIDIA GH200 Grace Hopper Superchip Delivers Outstanding Performance in MLPerf Inference v4.1]]> http://www.open-lab.net/blog/?p=89401 2024-11-06T02:27:00Z 2024-09-24T16:36:57Z

In the latest round of MLPerf Inference �C a suite of standardized, peer-reviewed inference benchmarks �C the NVIDIA platform delivered outstanding...]]>

In the latest round of MLPerf Inference – a suite of standardized, peer-reviewed inference benchmarks – the NVIDIA platform delivered outstanding performance across the board. Among the many submissions made using the NVIDIA platform were results using the NVIDIA GH200 Grace Hopper Superchip. GH200 tightly couples an NVIDIA Grace CPU with an NVIDIA Hopper GPU using NVIDIA NVLink-C2C…

]]> Ashraf Eassa <![CDATA[Low Latency Inference Chapter 1: Up to 1.9x Higher Llama 3.1 Performance with Medusa on NVIDIA HGX H200 with NVLink Switch]]> http://www.open-lab.net/blog/?p=88127 2024-11-29T21:06:37Z 2024-09-05T18:30:00Z

As large language models (LLMs) continue to grow in size and complexity, multi-GPU compute is a must-have to deliver the low latency and high throughput that...]]>

As large language models (LLMs) continue to grow in size and complexity, multi-GPU compute is a must-have to deliver the low latency and high throughput that real-time generative AI applications demand. Performance depends both on the ability for the combined GPUs to process requests as “one mighty GPU” with ultra-fast GPU-to-GPU communication and advanced software able to take full…

]]> Ashraf Eassa <![CDATA[Boosting Llama 3.1 405B Performance up to 1.44x with NVIDIA TensorRT Model Optimizer on NVIDIA H200 GPUs]]> http://www.open-lab.net/blog/?p=88017 2024-11-14T15:58:41Z 2024-08-28T19:30:00Z

The Llama 3.1 405B large language model (LLM), developed by Meta, is an open-source community model that delivers state-of-the-art performance and supports a...]]>

The Llama 3.1 405B large language model (LLM), developed by Meta, is an open-source community model that delivers state-of-the-art performance and supports a variety of use cases. With 405 billion parameters and support for context lengths of up to 128K tokens, Llama 3.1 405B is also one of the most demanding LLMs to run. To deliver both low latency to optimize the user experience and high…

]]> 1 Ashraf Eassa <![CDATA[NVIDIA Triton Inference Server Achieves Outstanding Performance in MLPerf Inference 4.1 Benchmarks]]> http://www.open-lab.net/blog/?p=87970 2024-09-05T18:37:49Z 2024-08-28T16:00:00Z

Six years ago, we embarked on a journey to develop an AI inference serving solution specifically designed for high-throughput and time-sensitive production use...]]>

Six years ago, we embarked on a journey to develop an AI inference serving solution specifically designed for high-throughput and time-sensitive production use cases from the ground up. At that time, ML developers were deploying bespoke, framework-specific AI solutions, which were driving up their operational costs and not meeting their latency and throughput service level agreements.

]]> Ashraf Eassa <![CDATA[NVIDIA Blackwell Platform Sets New LLM Inference Records in MLPerf Inference v4.1]]> http://www.open-lab.net/blog/?p=87957 2024-09-05T17:57:17Z 2024-08-28T15:00:00Z

Large language model (LLM) inference is a full-stack challenge. Powerful GPUs, high-bandwidth GPU-to-GPU interconnects, efficient acceleration libraries, and a...]]>

Large language model (LLM) inference is a full-stack challenge. Powerful GPUs, high-bandwidth GPU-to-GPU interconnects, efficient acceleration libraries, and a highly optimized inference engine are required for high-throughput, low-latency inference. MLPerf Inference v4.1 is the latest version of the popular and widely recognized MLPerf Inference benchmarks, developed by the MLCommons…

]]> 1 Ashraf Eassa <![CDATA[NVIDIA NVLink and NVIDIA NVSwitch Supercharge Large Language Model Inference]]> http://www.open-lab.net/blog/?p=87063 2024-08-22T18:25:32Z 2024-08-12T14:00:00Z

Large language models (LLM) are getting larger, increasing the amount of compute required to process inference requests. To meet real-time latency requirements...]]>

Large language models (LLM) are getting larger, increasing the amount of compute required to process inference requests. To meet real-time latency requirements for serving today’s LLMs and do so for as many users as possible, multi-GPU compute is a must. Low latency improves the user experience. High throughput reduces the cost of service. Both are simultaneously important. Even if a large…

]]> Ashraf Eassa <![CDATA[Revolutionizing Data Center Efficiency with the NVIDIA Grace Family]]> http://www.open-lab.net/blog/?p=86550 2024-10-09T20:01:54Z 2024-08-02T15:00:00Z

The exponential growth in data processing demand is projected to reach 175 zettabytes by 2025. This contrasts sharply with the slowing pace of CPU performance...]]>

The exponential growth in data processing demand is projected to reach 175 zettabytes by 2025. This contrasts sharply with the slowing pace of CPU performance improvements. For more than a decade, semiconductor advancements have not kept up with the pace predicted by Moore’s Law, leading to a pressing need for more efficient computing solutions. NVIDIA GPUs have emerged as the most efficient…

]]> Ashraf Eassa <![CDATA[NVIDIA NeMo Accelerates LLM Innovation with Hybrid State Space Model Support]]> http://www.open-lab.net/blog/?p=85602 2024-08-08T18:48:47Z 2024-07-17T17:32:08Z

Today��s large language models (LLMs) are based on the transformer model architecture introduced in 2017. Since then, rapid advances in AI compute performance...]]>

Today’s large language models (LLMs) are based on the transformer model architecture introduced in 2017. Since then, rapid advances in AI compute performance have enabled the creation of even larger transformer-based LLMs, dramatically improving their capabilities. Advanced transformer-based LLMs are enabling many exciting applications such as intelligent chatbots, computer code generation…

]]> 1 Ashraf Eassa <![CDATA[Achieving High Mixtral 8x7B Performance with NVIDIA H100 Tensor Core GPUs and NVIDIA TensorRT-LLM]]> http://www.open-lab.net/blog/?p=84749 2024-08-07T23:50:14Z 2024-07-02T18:00:00Z

As large language models (LLMs) continue to grow in size and complexity, the performance requirements for serving them quickly and cost-effectively continue to...]]>

As large language models (LLMs) continue to grow in size and complexity, the performance requirements for serving them quickly and cost-effectively continue to grow. Delivering high LLM inference performance requires an efficient parallel computing architecture and a flexible and highly optimized software stack. Recently, NVIDIA Hopper GPUs running NVIDIA TensorRT-LLM inference software set…

]]> Ashraf Eassa <![CDATA[NVIDIA Sets New Generative AI Performance and Scale Records in MLPerf Training v4.0]]> http://www.open-lab.net/blog/?p=83776 2024-06-27T18:18:05Z 2024-06-12T15:00:00Z

Generative AI models have a variety of uses, such as helping write computer code, crafting stories, composing music, generating images, producing videos, and...]]>

Generative AI models have a variety of uses, such as helping write computer code, crafting stories, composing music, generating images, producing videos, and more. And, as these models continue to grow in size and are trained on even more data, they are producing even higher-quality outputs. Building and deploying these more intelligent models is incredibly compute-intensive…

]]> Ashraf Eassa <![CDATA[Accelerate Generative AI Inference Performance with NVIDIA TensorRT Model Optimizer, Now Publicly Available]]> http://www.open-lab.net/blog/?p=81860 2024-06-13T22:22:46Z 2024-05-08T19:00:00Z

In the fast-evolving landscape of generative AI, the demand for accelerated inference speed remains a pressing concern. With the exponential growth in model...]]>

In the fast-evolving landscape of generative AI, the demand for accelerated inference speed remains a pressing concern. With the exponential growth in model size and complexity, the need to swiftly produce results to serve numerous users simultaneously continues to grow. The NVIDIA platform stands at the forefront of this endeavor, delivering perpetual performance leaps through innovations across…

]]> 3 Ashraf Eassa <![CDATA[NVIDIA H200 Tensor Core GPUs and NVIDIA TensorRT-LLM Set MLPerf LLM Inference Records]]> http://www.open-lab.net/blog/?p=80197 2024-11-14T15:53:12Z 2024-03-27T15:29:05Z

Generative AI is unlocking new computing applications that greatly augment human capability, enabled by continued model innovation. Generative AI...]]>

Generative AI is unlocking new computing applications that greatly augment human capability, enabled by continued model innovation. Generative AI models—including large language models (LLMs)—are used for crafting marketing copy, writing computer code, rendering detailed images, composing music, generating videos, and more. The amount of compute required by the latest models is immense and…

]]> Ashraf Eassa <![CDATA[Achieving Top Inference Performance with the NVIDIA H100 Tensor Core GPU and NVIDIA TensorRT-LLM]]> http://www.open-lab.net/blog/?p=75194 2023-12-14T23:33:04Z 2023-12-14T19:59:00Z

Best-in-class AI performance requires an efficient parallel computing architecture, a productive tool stack, and deeply optimized algorithms. NVIDIA released...]]>

Best-in-class AI performance requires an efficient parallel computing architecture, a productive tool stack, and deeply optimized algorithms. NVIDIA released the open-source NVIDIA TensorRT-LLM, which includes the latest kernel optimizations for the NVIDIA Hopper architecture at the heart of the NVIDIA H100 Tensor Core GPU. These optimizations enable models like Llama 2 70B to execute using…

]]> 1 Ashraf Eassa <![CDATA[NVIDIA TensorRT-LLM Enhancements Deliver Massive Large Language Model Speedups on NVIDIA H200]]> http://www.open-lab.net/blog/?p=74771 2023-12-14T19:27:30Z 2023-12-05T01:11:43Z

Large language models (LLMs) have seen dramatic growth over the last year, and the challenge of delivering great user experiences depends on both high-compute...]]>

Large language models (LLMs) have seen dramatic growth over the last year, and the challenge of delivering great user experiences depends on both high-compute throughput as well as large amounts of high-bandwidth memory. NVIDIA TensorRT-LLM provides optimizations for both peak throughput and memory optimization, delivering massive improvements in LLM inference performance.

]]> 0 Ashraf Eassa <![CDATA[New NVIDIA NeMo Framework Features and NVIDIA H200 Supercharge LLM Training Performance and Versatility]]> http://www.open-lab.net/blog/?p=74615 2024-04-15T19:49:44Z 2023-12-04T18:00:00Z

The rapid growth in the size, complexity, and diversity of large language models (LLMs) continues to drive an insatiable need for AI training performance....]]>

The rapid growth in the size, complexity, and diversity of large language models (LLMs) continues to drive an insatiable need for AI training performance. Delivering top performance requires the ability to train models at the scale of an entire data center efficiently. This is achieved through exceptional craftsmanship at every layer of the technology stack, spanning chips, systems, and software.

]]> 0 Ashraf Eassa <![CDATA[Setting New Records at Data Center Scale Using NVIDIA H100 GPUs and NVIDIA Quantum-2 InfiniBand]]> http://www.open-lab.net/blog/?p=72467 2023-11-24T18:36:30Z 2023-11-08T17:00:00Z

Generative AI is rapidly transforming computing, unlocking new use cases and turbocharging existing ones. Large language models (LLMs), such as OpenAI��s GPT...]]>

Generative AI is rapidly transforming computing, unlocking new use cases and turbocharging existing ones. Large language models (LLMs), such as OpenAI’s GPT models and Meta’s Llama 2, skillfully perform a variety of tasks on text-based content. These tasks include summarization, translation, classification, and generation of new content such as computer code, marketing copy, poetry, and much more.

]]> 0 Ashraf Eassa <![CDATA[Leading MLPerf Inference v3.1 Results with NVIDIA GH200 Grace Hopper Superchip Debut]]> http://www.open-lab.net/blog/?p=70450 2023-09-22T16:17:33Z 2023-09-09T16:00:00Z

AI is transforming computing, and inference is how the capabilities of AI are deployed in the world��s applications. Intelligent chatbots, image and video...]]>

AI is transforming computing, and inference is how the capabilities of AI are deployed in the world’s applications. Intelligent chatbots, image and video synthesis from simple text prompts, personalized content recommendations, and medical imaging are just a few examples of AI-powered applications. Inference workloads are both computationally demanding and diverse, requiring that platforms be…

]]> 1 Ashraf Eassa <![CDATA[New MLPerf Inference Network Division Showcases NVIDIA InfiniBand and GPUDirect RDMA Capabilities]]> http://www.open-lab.net/blog/?p=67021 2023-07-27T18:54:26Z 2023-07-06T16:00:00Z

In MLPerf Inference v3.0, NVIDIA made its first submissions to the newly introduced Network division, which is now part of the MLPerf Inference Datacenter...]]>

In MLPerf Inference v3.0, NVIDIA made its first submissions to the newly introduced Network division, which is now part of the MLPerf Inference Datacenter suite. The Network division is designed to simulate a real data center setup and strives to include the effect of networking—including both hardware and software—in end-to-end inference performance. In the Network division…

]]> 0 Ashraf Eassa <![CDATA[Breaking MLPerf Training Records with NVIDIA H100 GPUs]]> http://www.open-lab.net/blog/?p=66919 2023-07-13T19:00:28Z 2023-06-27T16:00:00Z

At the heart of the rapidly expanding set of AI-powered applications are powerful AI models. Before these models can be deployed, they must be trained through a...]]>

At the heart of the rapidly expanding set of AI-powered applications are powerful AI models. Before these models can be deployed, they must be trained through a process that requires an immense amount of AI computing power. AI training is also an ongoing process, with models constantly retrained with new data to ensure high-quality results. Faster model training means that AI-powered applications…

]]> 0 Ashraf Eassa <![CDATA[Setting New Records in MLPerf Inference v3.0 with Full-Stack Optimizations for AI]]> http://www.open-lab.net/blog/?p=62958 2023-07-05T19:23:50Z 2023-04-05T19:10:55Z

The most exciting computing applications currently rely on training and running inference on complex AI models, often in demanding, real-time deployment...]]>

The most exciting computing applications currently rely on training and running inference on complex AI models, often in demanding, real-time deployment scenarios. High-performance, accelerated AI platforms are needed to meet the demands of these applications and deliver the best user experiences. New AI models are constantly being invented to enable new capabilities…

]]> 0 Ashraf Eassa <![CDATA[Leading MLPerf Training 2.1 with Full Stack Optimizations for AI]]> http://www.open-lab.net/blog/?p=57148 2023-07-05T19:26:09Z 2022-11-09T18:00:00Z

MLPerf benchmarks, developed by MLCommons, are critical evaluation tools for organizations to measure the performance of their machine learning models' training...]]>

MLPerf benchmarks, developed by MLCommons, are critical evaluation tools for organizations to measure the performance of their machine learning models’ training across workloads. MLPerf Training v2.1—the seventh iteration of this AI training-focused benchmark suite—tested performance across a breadth of popular AI use cases, including the following: Many AI applications take advantage of…

]]> 0 Ashraf Eassa <![CDATA[Full-Stack Innovation Fuels Highest MLPerf Inference 2.1 Results for NVIDIA]]> http://www.open-lab.net/blog/?p=54638 2023-07-05T19:26:31Z 2022-09-08T18:10:00Z

Today��s AI-powered applications are enabling richer experiences, fueled by both larger and more complex AI models as well as the application of many models in...]]>

Today’s AI-powered applications are enabling richer experiences, fueled by both larger and more complex AI models as well as the application of many models in a pipeline. To meet the increasing demands of AI-infused applications, an AI platform must not only deliver high performance but also be versatile enough to deliver that performance across a diverse range of AI models.

]]> 0 Ashraf Eassa <![CDATA[Inside NVIDIA Grace CPU: NVIDIA Amps Up Superchip Engineering for HPC and AI]]> http://www.open-lab.net/blog/?p=52500 2023-05-23T23:41:15Z 2022-08-23T15:00:00Z

NVIDIA Grace CPU is the first data center CPU developed by NVIDIA. It has been built from the ground up to create the world��s first superchips. Designed...]]>

NVIDIA Grace CPU is the first data center CPU developed by NVIDIA. It has been built from the ground up to create the world’s first superchips. Designed to deliver excellent performance and energy efficiency to meet the demands of modern data center workloads powering digital twins, cloud gaming and graphics, AI, and high-performance computing (HPC), NVIDIA Grace CPU features 72 Armv9 CPU…

]]> 0 Ashraf Eassa <![CDATA[The Full Stack Optimization Powering NVIDIA MLPerf Training v2.0 Performance]]> http://www.open-lab.net/blog/?p=49597 2023-07-05T19:27:00Z 2022-06-30T18:00:00Z

MLPerf benchmarks are developed by a consortium of AI leaders across industry, academia, and research labs, with the aim of providing standardized, fair, and...]]>

MLPerf benchmarks are developed by a consortium of AI leaders across industry, academia, and research labs, with the aim of providing standardized, fair, and useful measures of deep learning performance. MLPerf training focuses on measuring time to train a range of commonly used neural networks for the following tasks: Lower training times are important to speed time to deployment…

]]> 0 Ashraf Eassa <![CDATA[Fueling High-Performance Computing with Full-Stack Innovation]]> http://www.open-lab.net/blog/?p=48769 2023-07-05T19:27:52Z 2022-06-02T18:45:00Z

High-performance computing (HPC) has become the essential instrument of scientific discovery. Whether it is discovering new, life-saving drugs, battling...]]>

High-performance computing (HPC) has become the essential instrument of scientific discovery. Whether it is discovering new, life-saving drugs, battling climate change, or creating accurate simulations of our world, these solutions demand an enormous—and rapidly growing—amount of processing power. They are increasingly out of reach of traditional computing approaches.

]]> 1 Ashraf Eassa <![CDATA[Saving Time and Money in the Cloud with the Latest NVIDIA-Powered Instances]]> http://www.open-lab.net/blog/?p=44315 2023-07-05T19:28:41Z 2022-03-01T19:13:57Z

AI is transforming every industry, enabling powerful new applications and use cases that simply weren��t possible with traditional software. As AI continues to...]]>

AI is transforming every industry, enabling powerful new applications and use cases that simply weren’t possible with traditional software. As AI continues to proliferate, and with the size and complexity of AI models on the rise, significant advances in AI compute performance are required to keep up. That’s where the NVIDIA platform comes in. With a full-stack approach spanning chips…

]]> 1 ��˳��97caoporen��