Janaki Vamaraju – NVIDIA Technical Blog News and tutorials for developers, data scientists, and IT admins 2025-05-29T19:05:18Z http://www.open-lab.net/blog/feed/ Janaki Vamaraju <![CDATA[Building Nemotron-CC, A High-Quality Trillion Token Dataset for LLM Pretraining from Common Crawl Using NVIDIA NeMo Curator]]> http://www.open-lab.net/blog/?p=99540 2025-05-29T19:05:18Z 2025-05-07T16:22:31Z Curating high-quality pretraining datasets is critical for enterprise developers aiming to train state-of-the-art large language models (LLMs). To enable...]]>

Curating high-quality pretraining datasets is critical for enterprise developers aiming to train state-of-the-art large language models (LLMs). To enable developers to build highly accurate LLMs, NVIDIA previously released Nemotron-CC, a 6.3-trillion-token English language Common Crawl (CC) dataset. Today, the NVIDIA NeMo Curator team is excited to share that the pipeline used to build the…

Source

]]>
Janaki Vamaraju <![CDATA[Advanced RAG Techniques for Telco O-RAN Specifications Using NVIDIA NIM Microservices]]> http://www.open-lab.net/blog/?p=90153 2024-11-20T20:00:36Z 2024-10-10T16:54:41Z Mobile communication standards play a crucial role in the telecommunications ecosystem by harmonizing technology protocols to facilitate interoperability...]]>

Mobile communication standards play a crucial role in the telecommunications ecosystem by harmonizing technology protocols to facilitate interoperability between networks and devices from different vendors. As these standards evolve, telecommunications companies face the ongoing challenge of managing complexity and volume. By leveraging generative AI, telecommunications companies can automate…

Source

]]>
Janaki Vamaraju <![CDATA[Streamlining Data Processing for Domain Adaptive Pretraining with NVIDIA NeMo Curator]]> http://www.open-lab.net/blog/?p=87876 2024-10-18T20:11:21Z 2024-09-10T16:30:00Z Domain-adaptive pretraining (DAPT) of large language models (LLMs) is an important step towards building domain-specific models. These models demonstrate...]]>

Domain-adaptive pretraining (DAPT) of large language models (LLMs) is an important step towards building domain-specific models. These models demonstrate greater capabilities in domain-specific tasks compared to their off-the-shelf open or commercial counterparts. Recently, NVIDIA published a paper about ChipNeMo, a family of foundation models that are geared toward industrial chip design…

Source

]]>
���˳���97caoporen����