Enhance Robot Learning with Synthetic Trajectory Data Generated by World Foundation Models

Generalist robotics have arrived, powered by advances in mechatronics and robot AI foundation models. But a key bottleneck remains: robots need vast training data for skills like assembly and inspection, and manual demonstrations aren’t scalable. The NVIDIA Isaac GR00T-Dreams blueprint, built on NVIDIA Cosmos, solves this challenge by generating massive synthetic trajectory data from just a single image and language prompt.

Using Cosmos world foundation models (WFMs) and generative AI, developers can rapidly create training data for models such as NVIDIA Isaac GR00T N1.5—the world’s first open foundation model for humanoid robot reasoning and skills.

This post introduces the Isaac GR00T-Dreams blueprint, detailing its advanced capabilities and its role in developing the Isaac GR00T N1.5 foundation model.

NVIDIA Isaac GR00T-Dreams blueprint overview

The Isaac GR00T-Dreams blueprint is a reference workflow for generating vast amounts of synthetic trajectory data. This data is used for teaching humanoid robots to perform new actions in novel environments.

The blueprint enables robots to generalize across behaviors and adapt to new environments with minimal human demonstration data. As a result, a small team of human demonstrators can create the same amount of training data it would otherwise take thousands of people to produce.

Video 1. Discover how robot brains explore future world states

The GR00T-Dreams blueprint complements the Isaac GR00T-Mimic blueprint. By scaling up existing demonstration data for known tasks using NVIDIA Omniverse and the Cosmos Transfer-1 WFM, GR00T-Mimic helps the robot develop deep proficiency and become a specialist in those specific skills. GR00T-Dreams employs Cosmos Predict-2 and Cosmos Reason to generate entirely new data for new tasks and environments, working to make the robot a generalist with broad adaptability.

GR00T-Dreams blueprint pipeline

Offering a powerful “real-to-real” data workflow for training generalist robots, the blueprint uses real robot data to create synthetic trajectories, which are subsequently used to train physical robots. This approach significantly reduces the need for extensive human demonstrations. The process includes the steps outlined below.

A flowchart diagram showing how teleoperated robot demonstration videos are collected, used to train a machine learning model, and then leveraged for automated action labeling and robot control. — *Figure 1. NVIDIA Isaac GR00T-Dreams blueprint architecture*

Step 1: Post-train with human demonstrations

First, developers collect a limited set of human-teleoperated trajectories for a humanoid robot performing a single task, such as pick-and-place, in a single environment. This real-world data is then used to post-train the Cosmos Predict-2 WFM. This post-training step allows the model to learn specific movement capabilities and functional constraints that can be unique to that robot.

Step 2: Generate “dreams”

Next, developers prompt the fine-tuned Cosmos model with an initial image and new text-based instructions for the generated robot to perform. This prompts the generative model to create a vast number of diverse and novel task scenarios or future world states (also called dreams) such as opening, closing, arranging objects, cleaning and sorting. These scenarios are created in the form of 2D videos.

Step 3: Reason and filter

Once a large number of dreams are generated, the Cosmos Reason model can be used to? evaluate the quality and success of each dream. It filters out “bad” dreams, which depict unsuccessful or flawed task attempts, ensuring only the highest-quality and most relevant scenarios are selected for the next stage.

Step 4: Extract neural trajectories

The selected dreams, which are initially just pixels in a 2D video, are then processed using an Inverse Dynamics Model (IDM), a generative AI model for action labeling, to generate 3D action trajectories. The model works by taking as input two image frames from the 2D video—a “before” and an “after”—and predicting the segment of actions that occur between them.

This critical step translates the visual information from the dream videos into actionable data that a robot can learn from. These 2D videos, now enriched with 3D action data, are called neural trajectories.

Step 5: Train the visuomotor policy

Finally, these neural trajectories are used as a large-scale synthetic dataset to train visuomotor policies either by co-training alongside real-world data to enhance performance, or through solely training on them to enable generalization to novel behaviors and unseen environments.

Advanced capabilities for robot learning

The GR00T-Dreams blueprint unlocks advanced capabilities for robot learning, including new behaviors, new environments, and more.

New behaviors: Robots learn novel actions from language instructions, even with training data from only a single task (pick-and-place, for example).

Side-by-side videos showcasing a humanoid robot opening a laptop from a first-person perspective and a third-person perspective. — *Figure 2. The neural trajectory and real-robot (Fourier GR-1) execution of a robot opening a laptop, enabled by GR00T-Dreams*

New environments: Robots generalize to completely unseen environments, even if the world model was trained in only one lab setting.

Side-by-side videos showcasing a humanoid robot placing a tangerine fruit in a bowl from a first-person perspective and a third-person perspective. — *Figure 3. The neural trajectory and real-robot (Fourier GR-1) execution of a robot placing a tangerine fruit in a bowl, enabled by GR00T-Dreams*

Multiple robot types: Works across diverse robot embodiments, from humanoids to manipulators (like Franka and SO-100), and supports multiple camera views.

Side-by-side videos showing a Franka Arm performing cube stacking task on the left and SO-100 robot arm on right performing pick-and-place tasks. — *Figure 4. Franka Arm and SO-100 Manipulators performing different manipulation tasks, enabled by GR00T-Dreams*

Enhanced learning for complex tasks: Augments training data for challenging, contact-rich tasks such as manipulating deformable objects (folding) or using tools (hammering), functioning as a real-to-real workflow from an initial real frame.

Side-by-side videos showcasing a humanoid robot hammering from a first-person perspective and a third-person perspective. — *Figure 5. The neural trajectory and real-robot execution (Fourier GR-1) of a robot hammering enabled by GR00T-Dreams?*

Post-training GR00T N1.5 with GR00T-Dreams

Vision language action (VLA) models can be post-trained using GR00T-Dreams to enable novel behaviors and operation in unseen environments.

NVIDIA Research used the GR00T-Dreams blueprint to generate synthetic training data to develop GR00T N1.5 in just 36 hours. This process would have taken nearly three months using manual human data collection.

GR00T N1.5 is the first update to GR00T N1, the world’s first open foundation model for generalized humanoid robot reasoning and skills. This cross-embodiment model takes multimodal input, including language and images, to perform manipulation tasks in diverse environments.

What’s new in GR00T N1.5:

Improved accuracy in comprehending language instructions?
Enhanced generalization to new objects and environments enabled by the Isaac GR00T-Dreams blueprint.
Improved vision-language foundation with better spatial understanding and open-world visual grounding using Eagle 2.5
Higher success rate in material handling and manufacturing tasks

Open NVIDIA Physical AI Dataset

NVIDIA has expanded the open NVIDIA Physical AI Dataset collection, the most downloaded robotics dataset on Hugging Face. Initially launched in March 2025, the dataset now includes thousands of new robotics trajectories, featuring the first real-world training data from the Unitree G1 robot and 24,000 simulated teleoperation trajectories.

This collection, which also contains synthetic simulation data for various manipulation tasks, was instrumental in developing GR00T N1.5.

Ecosystem adoption of GR00T N models

Early adopters of GR00T N models include AeiRobot, Foxlink, Lightwheel, and NEURA Robotics.

AeiRobot uses them to allow its industrial robots to understand natural language for complex pick-and-place tasks. Foxlink is leveraging the models to improve the flexibility and efficiency of its industrial robot arms. Lightwheel is utilizing them to validate synthetic data for the faster deployment of humanoid robots in factories. NEURA Robotics is evaluating the models to accelerate the development of its household automation systems.

Get started accelerating robot learning

The NVIDIA Isaac GR00T-Dreams blueprint is a reference workflow for generating vast amounts of synthetic trajectory data. This data is used for teaching humanoid robots to perform new actions in novel environments. The blueprint enables robots to generalize across behaviors and adapt to new environments with minimal human demonstration data.

To get started with GR00T-Dreams:

Check out the open source end-to-end pipeline.
Explore the post-trained Cosmos Predict-2 model for generating the dreams.
Learn more about DreamGen, the research that powers GR00T-Dreams, in the paper DREAMGEN: Unlocking Generalization in Robot Learning through Neural Trajectories.

To get started with GR00T N1.5:

Download Isaac GR00T N1.5 from Hugging Face.?
Browse sample datasets and PyTorch scripts for post-training on GitHub.

Stay up to date by subscribing to our newsletter and following NVIDIA Robotics on LinkedIn, Instagram, X, and Facebook. Explore NVIDIA documentation and YouTube channels, and join the NVIDIA Developer Robotics forum. To start your robotics journey, enroll in our free NVIDIA Robotics Fundamentals courses today.

Enhance Robot Learning with Synthetic Trajectory Data Generated by World Foundation Models

NVIDIA Isaac GR00T-Dreams blueprint overview

GR00T-Dreams blueprint pipeline