Shar Narasimhan – NVIDIA Technical Blog News and tutorials for developers, data scientists, and IT admins 2025-07-08T15:27:34Z http://www.open-lab.net/blog/feed/ Shar Narasimhan <![CDATA[Asking an Encyclopedia-Sized Question: How To Make the World Smarter with Multi-Million Token Real-Time Inference]]> http://www.open-lab.net/blog/?p=102927 2025-07-08T15:27:34Z 2025-07-08T01:00:00Z Modern AI applications increasingly rely on models that combine huge parameter counts with multi-million-token context windows. Whether it is AI agents...]]>

Modern AI applications increasingly rely on models that combine huge parameter counts with multi-million-token context windows. Whether it is AI agents following months of conversation, legal assistants reasoning through gigabytes of case law as big as an entire encyclopedia set, or coding copilots navigating sprawling repositories, preserving long-range context is essential for relevance and…

Source

]]>
Shar Narasimhan <![CDATA[Strategies for Maximizing Data Center Energy Efficiency]]> http://www.open-lab.net/blog/?p=65020 2023-06-09T20:25:50Z 2023-05-23T15:00:00Z Data centers are an essential part of a modern enterprise, but they come with a hefty energy cost. To complicate matters, energy costs are rising and the need...]]>

Data centers are an essential part of a modern enterprise, but they come with a hefty energy cost. To complicate matters, energy costs are rising and the need for data centers continues to expand, with a market size projected to grow 25% from 2023 to 2030. Globally, energy costs are already negatively affecting data centers and high-performance computing (HPC) systems. To alleviate the energy…

Source

]]>
0
Shar Narasimhan <![CDATA[NVIDIA, Arm, and Intel Publish FP8 Specification for Standardization as an Interchange Format for AI]]> http://www.open-lab.net/blog/?p=54825 2023-02-13T19:01:09Z 2022-09-14T15:00:00Z AI processing requires full-stack innovation across hardware and software platforms to address the growing computational demands of neural networks. A key area...]]>

AI processing requires full-stack innovation across hardware and software platforms to address the growing computational demands of neural networks. A key area to drive efficiency is using lower precision number formats to improve computational efficiency, reduce memory usage, and optimize for interconnect bandwidth. To realize these benefits, the industry has moved from 32-bit precisions to…

Source

]]>
1
Shar Narasimhan <![CDATA[Accelerating AI Inference Workloads with NVIDIA A30 GPU]]> http://www.open-lab.net/blog/?p=47944 2022-08-30T18:58:43Z 2022-05-11T22:43:14Z NVIDIA A30 GPU is built on the latest NVIDIA Ampere Architecture to accelerate diverse workloads like AI inference at scale, enterprise training, and HPC...]]>

NVIDIA A30 GPU is built on the latest NVIDIA Ampere Architecture to accelerate diverse workloads like AI inference at scale, enterprise training, and HPC applications for mainstream servers in data centers. The A30 PCIe card combines the third-generation Tensor Cores with large HBM2 memory (24 GB) and fast GPU memory bandwidth (933 GB/s) in a low-power envelope (maximum 165 W).

Source

]]>
1
Shar Narasimhan <![CDATA[Boosting NVIDIA MLPerf Training v1.1 Performance with Full Stack Optimization]]> http://www.open-lab.net/blog/?p=41919 2023-07-05T19:29:06Z 2021-12-01T21:33:20Z Five months have passed since v1.0, so it is time for another round of the MLPerf training benchmark. In this v1.1 edition, optimization over the entire...]]>

Five months have passed since v1.0, so it is time for another round of the MLPerf training benchmark. In this v1.1 edition, optimization over the entire hardware and software stack sees continuing improvement across the benchmarking suite for the submissions based on NVIDIA platform. This improvement is observed consistently at all different scales, from single machines all the way to industrial…

Source

]]>
2
Shar Narasimhan <![CDATA[MLPerf v1.0 Training Benchmarks: Insights into a Record-Setting NVIDIA Performance]]> http://www.open-lab.net/blog/?p=33929 2023-07-05T19:31:00Z 2021-06-30T17:00:00Z MLPerf is an industry-wide AI consortium tasked with developing a suite of performance benchmarks that cover a range of leading AI workloads widely in use. The...]]>

MLPerf is an industry-wide AI consortium tasked with developing a suite of performance benchmarks that cover a range of leading AI workloads widely in use. The latest MLPerf v1.0 training round includes vision, language and recommender systems, and reinforcement learning tasks. It is continually evolving to reflect the state-of-the-art AI applications. NVIDIA submitted MLPerf v1.0…

Source

]]>
1
Shar Narasimhan <![CDATA[Updating AI Product Performance from Throughput to Time-To-Solution]]> http://www.open-lab.net/blog/?p=22364 2023-07-05T19:33:54Z 2020-11-23T17:00:06Z Data scientists and researchers work toward solving the grand challenges of humanity with AI projects such as developing autonomous cars or nuclear fusion...]]>

Data scientists and researchers work toward solving the grand challenges of humanity with AI projects such as developing autonomous cars or nuclear fusion energy research. They depend on powerful, high-performance AI platforms as essential tools to conduct their work. Even enterprise-grade AI implementation efforts—adding intelligent video analytics to existing video camera streams or image…

Source

]]>
1
Shar Narasimhan <![CDATA[NVIDIA Clocks World��s Fastest BERT Training Time and Largest Transformer Based Model, Paving Path For Advanced Conversational AI]]> http://www.open-lab.net/blog/?p=15430 2022-08-21T23:39:34Z 2019-08-13T13:00:23Z NVIDIA DGX SuperPOD trains?BERT-Large in just 47 minutes, and trains GPT-2 8B, the largest Transformer Network Ever with 8.3Bn parameters? Conversational AI...]]>

NVIDIA DGX SuperPOD trains BERT-Large in just 47 minutes, and trains GPT-2 8B, the largest Transformer Network Ever with 8.3Bn parameters Conversational AI is an essential building block of human interactions with intelligent machines and applications – from robots and cars, to home assistants and mobile apps. Getting computers to understand human languages, with all their nuances…

Source

]]>
3
Shar Narasimhan <![CDATA[New Optimizations To Accelerate Deep Learning Training on NVIDIA GPUs]]> http://www.open-lab.net/blog/?p=12964 2023-02-13T17:46:37Z 2018-12-03T16:00:36Z The pace of AI adoption across diverse industries depends on maximizing data scientists�� productivity. NVIDIA releases optimized NGC containers every month...]]>

The pace of AI adoption across diverse industries depends on maximizing data scientists’ productivity. NVIDIA releases optimized NGC containers every month with improved performance for deep learning frameworks and libraries, helping scientists maximize their potential. NVIDIA continuously invests in the full data science stack, including GPU architecture, systems, and software stacks.

Source

]]>
0
���˳���97caoporen����