Effective harnesses for long-running agents

Anthropic puts a leash on long-running AIs; fans cheer, skeptics roll eyes

TLDR: Anthropic proposes a two-agent “harness” to keep AI work steady across sessions and leave clean handoffs. Commenters love the tidy scaffolding but debate whether this fixes the hard last 30% or just rebrands project management, with jokes and test-driven wishlists fueling the drama.

Anthropic’s latest brainchild is a “harness” for AI agents — basically a way to keep a bot on task over hours or days without forgetting everything between sessions. The plan: an initializer sets the stage, then a coding agent works in small steps and leaves a tidy trail for the next shift. It’s meant to fix the classic AI amnesia problem and build real apps instead of half-baked demos, all inside the Claude Agent SDK with code in the quickstart.

Cue the comment fireworks. The top vibe: “Cool demo, but the last mile still hurts.” One veteran summed it up as the 70/30 rule — easy to get impressive fast, brutal to finish — warning that execs will see shiny progress and think it’s done. Another camp says Anthropic is reinventing a project tracker, bragging they’ve already wired up tools like Plane, docs, and workflows to wrangle their bots. A practical aside popped off: using structured files like JSON helps keep the model from wrecking Markdown notes — tiny tweak, big win. The meme machine delivered with “BDSM for LLMs,” because of course it did. And the testers chimed in: could plain‑English testing frameworks like BDD (behaviour-driven development) help keep the bot honest?

The split is clear: some see smart scaffolding, others see a fancy leash. But everyone agrees — if agents stop declaring “done” mid‑project, that’s real progress.

Key Points

  • Anthropic identifies persistent challenges for long-running AI agents working across multiple context windows without memory.
  • The Claude Agent SDK includes compaction for context management, but this alone does not ensure reliable multi-session progress.
  • Observed failures include attempting one-shot builds that exhaust context and leaving half-implemented, undocumented features.
  • Another failure mode is agents prematurely declaring projects complete after partial progress.
  • Anthropic proposes an initializer agent to set up the environment and a coding agent to make incremental progress and maintain a clean, merge-ready state, with examples in a quickstart repo.

Hottest takes

"BDSM for LLMs" — slurrpurr
"get 70% of the way there… The problem is the remaining 30%" — roughly
"attempting to reinvent a project tracker" — _boffin_
Made with <3 by @siedrix and @shesho from CDMX. Powered by Forge&Hive.