GEN-0 / Embodied Foundation Models That Scale with Physical Interaction

Robot folds box like a pro; hype vs "how does it know"

TLDR: A new robot model, GEN-0, claims it scales by learning from real physical interaction and can “think while moving,” even folding tricky box flaps. Commenters are wowed but split: hyped by the skill, and skeptical about autonomy and how it’s told its goals—making this a big moment for real-world robot smarts

GEN-0 just dropped a flex: a robot brain that learns by actually touching stuff, not just reading and watching. In demos, it builds a camera kit and nails the tiny box flap that frustrates actual humans. The company says bigger models get smarter, hitting an “intelligence threshold” around 7B parameters and scaling beyond, with “Harmonic Reasoning” letting the bot think while it moves. Translation: it doesn’t pause to plan—physics doesn’t wait. The internet reaction? Awe meets side-eye.

One camp is screaming “witchcraft!” after seeing the flap folded first try—tyushk admits they can’t even do that themselves. Another camp asks the practical stuff: amluto wants to know how the robot is told its goal—text prompt, vision, vibes? Threads ignited with “is this fully autonomous or choreographed?” debates, plus jokes about “robot puberty at 7B” and the rise of IKEA-bot 3000. Fans love the cross-robot demos and the claim of 270,000 hours of hands-on data, growing fast. Skeptics want proof it isn’t just memorizing one fancy unboxing. Meanwhile, meme-lords crowned GEN-0 the “world’s fastest intern,” and productivity bros declared end times for fiddly tasks. Whether you’re cheering or clutching your Allen key, the vibes are loud

Key Points

  • GEN-0 is introduced as an embodied foundation model trained on high-fidelity physical interaction data.
  • The system exhibits scaling laws, with performance improving predictably with more pretraining data and compute.
  • A phase transition is reported at around 7B parameters; larger models continue to improve while smaller ones ossify.
  • Harmonic Reasoning enables concurrent thinking and acting without System1–System2 architectures or inference-time guidance.
  • GEN-0 is cross-embodiment and has been tested on 6DoF, 7DoF, and 16+DoF robots; pretraining uses a 270,000+ hour dataset growing by ~10,000 hours/week.

Hottest takes

"If it really is fully autonomous, that first video is insane" — tyushk
"I struggle to put those little tags into the slot" — tyushk
"I’m curious how they prompt the model or otherwise tell it what its goal is" — amluto
Made with <3 by @siedrix and @shesho from CDMX. Powered by Forge&Hive.