June 29, 2026
Bird brain or brainy bird?
Ornith-1.0: Self-scaffolding LLMs for agentic coding
This coding AI says it can coach itself, and commenters are split between hype and side-eye
TLDR: Ornith-1.0 is a new coding AI that claims unusually strong results, especially a small version that performs like much larger systems. Commenters are torn between excitement over the huge performance claims and skepticism that this may be clever repackaging rather than a true breakthrough.
A new open-source coding AI called Ornith-1.0 has landed with a huge promise: instead of just trying to solve programming tasks, it also teaches itself how to tackle them. In plain English, it’s like a student who not only does the homework but invents its own study guide too. The creators say this trick helps even the smaller version punch way above its weight, with the tiny 9B model reportedly keeping up with much bigger rivals and the giant flagship model beating or matching some of the biggest names in coding tests.
But the real action is in the reactions. One commenter was stunned that the 9B model could allegedly deliver something close to a much larger Qwen model, calling that claim “bonkers.” That set the mood fast: half the crowd is impressed, half is squinting at the scoreboard like they’re checking for hidden fine print. Another commenter brought the vibe crashing back to earth by testing the model on security bug hunting and reporting that it did poorly with limited tools. Give it a full shell and Python, though, and it found twice as much—still not great, but enough for a grudging, “okay, maybe it actually is building tools for itself.”
And then came the skeptic energy: is this a breakthrough, or just prompt engineering in a fancy new outfit? One user basically asked whether Ornith is just a model that has been trained to always write and run code first. That sparked the classic tech-forum drama: genius new method or old trick with better branding? No big meme storm yet, but the tone is deliciously familiar—equal parts awe, nitpicking, and “someone please reproduce this before we crown it king.”
Key Points
- •Ornith-1.0 is an open-source family of agentic coding models built on pretrained Gemma 4 and Qwen 3.5, with variants from 9B Dense to 397B MoE.
- •The article’s main technical claim is a self-improving training framework that jointly learns task scaffolds and solution rollouts during reinforcement learning.
- •Ornith-1.0-397B is reported to score 77.5 on Terminal-Bench 2.1 and 82.4 on SWE-Bench Verified, outperforming or matching several leading models cited in the article.
- •Ornith-1.0-35B is reported to outperform similarly sized models and to beat Qwen 3.5-397B on Terminal-Bench 2.1 with 64.4 versus 53.5.
- •The article says self-generated scaffolds create a reward-hacking risk, such as satisfying a verifier by reading visible tests or hardcoding expected artifacts.