June 11, 2026
Open source or open season?
Open Reproduction of DeepSeek-R1
AI fans cheer the open clone—while critics yell it’s already old news
TLDR: Open R1 says it has completed the first major step toward recreating DeepSeek-R1 in public, releasing datasets and a training recipe anyone can inspect. Commenters, however, instantly turned it into a debate over whether the project is impressive, outdated, or already beaten by better-known open AI efforts.
A group of developers says it’s building a fully open version of DeepSeek-R1, the buzzy reasoning AI model, so anyone can study it, copy it, and build on it. On paper, that’s a big deal: the project has released datasets, training recipes, and a roadmap to recreate the model step by step. The latest brag is that Step 1 is done, including a giant “Mixture-of-Thoughts” dataset with 350,000 verified reasoning examples. In plain English: they’re trying to turn a secretive AI magic trick into a recipe card the public can actually use.
But in the comments, the real action was less “wow” and more “hang on, is this already outdated?” One of the loudest reactions was pure timestamp panic: users immediately noticed the repo looked stale at first glance, with one person basically begging for “2025” to be added to the title so people stop assuming it’s abandoned. Others were even harsher, dropping drive-by dismissals like “Too old now”—the kind of comment that lands like a reality-show slap.
Then came the open-source cage match. Some commenters waved people toward rivals like OpenThoughts, saying it already has a popular dataset and stronger small models. Others name-dropped OLMo and Nemotron as the real examples of open AI training done right. And of course, someone asked the question lurking behind every ambitious AI project: how much money does this actually cost? So yes, the repo is impressive—but the crowd is treating it like a tech talent show, and the judges are ruthless.
Key Points
- •Open R1 is an open-source project intended to reproduce the DeepSeek-R1 pipeline and provide missing components for others to build on it.
- •The repository includes scripts for GRPO training, supervised fine-tuning, and synthetic data generation, plus a Makefile to run pipeline steps.
- •The project roadmap has three stages: reproducing R1-Distill models, reproducing the RL pipeline behind R1-Zero, and showing multi-stage training from base model to RL-tuned model.
- •A 2025/05/26 update states that Step 1 was completed with release of the Mixture-of-Thoughts dataset and a recipe for OpenR1-Distill-7B.
- •Installation guidance specifies CUDA 12.4, Python 3.11, vLLM 0.8.5.post1, FlashAttention, PyTorch 2.6.0, and authentication with Hugging Face and Weights & Biases.