Posts by Janaki Vamaraju
Generative AI
May 07, 2025
Building Nemotron-CC, A High-Quality Trillion Token Dataset for LLM Pretraining from Common Crawl Using NVIDIA NeMo Curator
Curating high-quality pretraining datasets is critical for enterprise developers aiming to train state-of-the-art large language models (LLMs). To enable...
7 MIN READ
Generative AI
Oct 10, 2024
Advanced RAG Techniques for Telco O-RAN Specifications Using NVIDIA NIM Microservices
Mobile communication standards play a crucial role in the telecommunications ecosystem by harmonizing technology protocols to facilitate interoperability...
8 MIN READ
Generative AI
Sep 10, 2024
Streamlining Data Processing for Domain Adaptive Pretraining with NVIDIA NeMo Curator
Domain-adaptive pretraining (DAPT) of large language models (LLMs) is an important step towards building domain-specific models. These models demonstrate...
16 MIN READ