Synthetic Data Generation

Jun 11, 2025
Advancing Agentic AI with NVIDIA Nemotron Open Reasoning Models
As AI progresses toward greater autonomy, the emergence of AI agents capable of independent decision-making marks a significant milestone. To function...
6 MIN READ

Jun 11, 2025
Simplify End-to-End Autonomous Vehicle Development with New NVIDIA Cosmos World Foundation Models
The shift to end-to-end planning models for powering autonomous vehicles (AVs) is increasing the demand for high-quality, physically-based sensor data. These...
7 MIN READ

Jun 11, 2025
Accelerating AV Simulation with Neural Reconstruction and World Foundation Models
Autonomous vehicle (AV) stacks are evolving from a hierarchy of discrete building blocks to end-to-end architectures built on foundation models. This transition...
7 MIN READ

Jun 11, 2025
Develop Custom Physical AI Foundation Models with NVIDIA Cosmos Predict-2
Building smarter robots and autonomous vehicles (AVs) starts with physical AI models that understand real-world dynamics. These models serve two critical roles:...
7 MIN READ

May 14, 2025
Build Custom Reasoning Models with Advanced, Open Post-Training Datasets
Synthetic data has become a standard part of large language model (LLM) post-training procedures. Using a large number of synthetically generated examples from...
5 MIN READ

May 07, 2025
Building Nemotron-CC, A High-Quality Trillion Token Dataset for LLM Pretraining from Common Crawl Using NVIDIA NeMo Curator
Curating high-quality pretraining datasets is critical for enterprise developers aiming to train state-of-the-art large language models (LLMs). To enable...
7 MIN READ

Apr 07, 2025
Evaluating and Enhancing RAG Pipeline Performance Using Synthetic Data?
As large language models (LLM) gain popularity in various question-answering systems, retrieval-augmented generation (RAG) pipelines have also become a focal...
11 MIN READ

Jan 29, 2025
Mastering LLM Techniques: Evaluation
Evaluating large language models (LLMs) and retrieval-augmented generation (RAG) systems is a complex and nuanced process, reflecting the sophisticated and...
12 MIN READ

Jan 09, 2025
Advancing Physical AI with NVIDIA Cosmos World Foundation Model Platform
As robotics and autonomous vehicles advance, accelerating development of physical AI—which enables autonomous machines to perceive, understand, and perform...
14 MIN READ

Jan 06, 2025
How to Build a Generative AI-Enabled Synthetic Data Pipeline for Perception-Based Physical AI
Training physical AI models used to power autonomous machines, such as robots and autonomous vehicles, requires huge amounts of data. Acquiring large sets of...
7 MIN READ

Dec 03, 2024
Scaling Action Recognition Models with Synthetic Data
Action recognition models such as PoseClassificationNet have been around for some time, helping systems identify and classify human actions like walking,...
11 MIN READ

Sep 30, 2024
Improve Reinforcement Learning from Human Feedback with Leaderboard-Topping Reward Model
Llama 3.1 Nemotron 70B Reward model helps generate high-quality training data that aligns with human preferences for finance, retail, healthcare, scientific...
1 MIN READ

Aug 27, 2024
Simplifying Camera Calibration to Enhance AI-Powered Multi-Camera Tracking
This post is the third in a series on building multi-camera tracking vision AI applications. We introduce the overall end-to-end workflow and fine-tuning...
12 MIN READ

Aug 16, 2024
Leverage the Latest Open Models for Synthetic Data Generation with NVIDIA Nemotron-4-340B
The Llama-3.1-Nemotron 70B-Reward model helps generate high-quality training data that aligns with human preferences for finance, retail,...
8 MIN READ

Jul 23, 2024
Creating Synthetic Data Using Llama 3.1 405B
Synthetic data isn’t about creating new information. It's about transforming existing information to create different variants. For over a decade, synthetic...
15 MIN READ

Jun 24, 2024
Addressing Medical Imaging Limitations with Synthetic Data Generation
Synthetic data in medical imaging offers numerous benefits, including the ability to augment datasets with diverse and realistic images where real data is...
9 MIN READ