Deepak Narayanan – NVIDIA Technical Blog

Deepak Narayanan – NVIDIA Technical Blog News and tutorials for developers, data scientists, and IT admins 2025-03-24T18:36:43Z http://www.open-lab.net/blog/feed/ Deepak Narayanan <![CDATA[Ensuring Reliable Model Training on NVIDIA DGX Cloud]]> http://www.open-lab.net/blog/?p=96789 2025-03-24T18:36:43Z 2025-03-10T16:26:44Z

Training AI models on massive GPU clusters presents significant challenges for model builders. Because manual intervention becomes impractical as job scale...]]>

Training AI models on massive GPU clusters presents significant challenges for model builders. Because manual intervention becomes impractical as job scale increases, automation is critical to maintaining high GPU utilization and training productivity. An exceptional training experience requires resilient systems that provide low-latency error attribution and automatic fail over based on root…

]]> Deepak Narayanan <![CDATA[Train Generative AI Models More Efficiently with New NVIDIA Megatron-Core Functionalities]]> http://www.open-lab.net/blog/?p=84953 2024-07-25T18:14:45Z 2024-07-12T22:25:42Z

First introduced in 2019, NVIDIA Megatron-LM sparked a wave of innovation in the AI community, enabling researchers and developers to use the underpinnings of...]]>

First introduced in 2019, NVIDIA Megatron-LM sparked a wave of innovation in the AI community, enabling researchers and developers to use the underpinnings of this open-source library to further large language model (LLM) advancements. Today, many of the most popular LLM developer frameworks have been inspired by and built using the Megatron-LM library, spurring a wave of foundation models and AI…

]]> Deepak Narayanan <![CDATA[Scaling Language Model Training to a Trillion Parameters Using Megatron]]> http://www.open-lab.net/blog/?p=24760 2023-03-22T01:12:02Z 2021-04-12T17:00:00Z

Natural Language Processing (NLP) has seen rapid progress in recent years as computation at scale has become more available and datasets have become larger. At...]]>

Natural Language Processing (NLP) has seen rapid progress in recent years as computation at scale has become more available and datasets have become larger. At the same time, recent work has shown large language models to be effective few-shot learners, with high accuracy on many NLP datasets without additional finetuning. As a result, state-of-the-art NLP models have grown at an exponential rate…

]]> 1 ��˳��97caoporen��