May 6, 2026
AI gets notes, comments get spicy
Following the Text Gradient at Scale
AI researchers say scorecards are out and detailed feedback is in — and commenters say, “haven’t we seen this before?”
TLDR: Researchers say AI learns better from detailed written advice than from a single score, which could matter a lot for long, costly tasks. Commenters immediately pushed back with a familiar internet question: is this a real breakthrough, or just something older repackaged with shinier language?
The big pitch here is surprisingly easy to get: instead of grading an artificial intelligence system with a single lonely number — basically a gold star or a frown face — researchers want to give it actual written feedback about what went right and what went wrong. Their argument is that modern systems, especially ones working on long, expensive tasks, are being judged in a painfully wasteful way. In other words: stop telling the machine “4 out of 5,” and start telling it, “great top layer, but where are the cherries?”
But in the comments, the mood was less “mind blown” and more “hold on, this sounds familiar.” The standout reaction came from one reader who casually dropped a Nature link and basically said this already looks a lot like what BindCraft does in drug discovery, just without one extra step. That instantly shifts the vibe from pure breakthrough hype to a classic internet showdown: new frontier or old trick with better branding?
That’s the drama in a nutshell. The article frames this as a major rethink of how AI learns, ditching crude scores for richer advice. Meanwhile, the community’s hottest take is the timeless tech-comment-section favorite: “Cool idea, but is it actually new?” Even with only one comment, the energy is unmistakable — equal parts impressed, skeptical, and ready to fact-check the hype before dessert is served.
Key Points
- •The article argues that reinforcement learning often discards rich evaluator feedback by compressing it into a single scalar reward.
- •It presents detailed verbal feedback as more actionable than numeric rewards because it can identify specific problems and suggested fixes.
- •The article says this scalar bottleneck is especially costly for long-running LLM tasks that produce rich diagnostic information such as tool logs and error traces.
- •It describes an emerging text-based optimization paradigm that directly uses natural-language feedback to revise artifacts like prompts, code, and molecule specifications.
- •The article highlights Feedback Descent as recent work that reportedly outperforms specialized RL methods in molecular design and prompt optimization.