Hi everyone!

Check out something truly groundbreaking: Isaac GR00T N1 from NVIDIA – they're calling it the world's first open foundation model for general-purpose humanoid robot reasoning and skills! The goal here is to democratize Physical AI.

What's so special about this? It's a single neural network that goes from "photons to actions" – taking in images and language, and outputting continuous control signals for a robot. And it's designed to be general, not just for one specific task or robot.

They've trained it on a massive and diverse dataset:

Real humanoid teleoperation data.
Synthetic data generated in simulation (they're open-sourcing 300K+ trajectories!).
"Neural trajectories" – using video generation models to create even more training data with accurate physics.
Latent actions extracted from in-the-wild human videos.
They've even developed new algorithms to extract "action tokens" from videos.

The architecture is also interesting: it's a "System 1, System 2" setup. System 2 (a Vision-Language Model) understands the scene and the instructions, while System 1 (a Diffusion Transformer) handles the fast, precise motor control.

NVIDIA is now empowering the next generation of humanoid robots with these open foundations, don't underestimate the impact of this.

NVIDIA Isaac GR00T N1 - Open Foundation Model for Humanoids

Replies