June 6, 2026

When bots review bots, wallets cry

Tokenomics: Quantifying Where Tokens Are Used in Agentic Software Engineering

AI coders are burning most of their budget arguing with themselves, and the comments are ruthless

TLDR: The study found AI software agents spend most of their paid word budget on reviewing and refining code, not writing it, which could make costs surprisingly high. Commenters mocked the waste, argued over the word “tokenomics,” and joked that future engineers may be judged on how cheaply they can make AI think.

The big reveal in this research paper is almost hilariously relatable: when teams of artificial intelligence helpers build software, they don’t spend most of their effort writing code — they spend it reviewing, re-checking, and second-guessing it. In the study, nearly 60% of all token use — basically the paid word-count that powers these systems — went into code review. Translation for normal people: the expensive part isn’t the first draft, it’s the endless robot back-and-forth after.

And the community? Oh, they pounced. One commenter joked that coding agents love producing “thousands of unit tests” but somehow avoid testing things dynamically in the real world, which is the kind of roast only developers can deliver with a straight face. Another saw a new job title forming in real time: forget old-school infrastructure optimization, soon engineers may be hired for being good at saving AI tokens instead. That idea landed somewhere between prophecy and punchline.

Then came the naming drama. A commenter bristled at the paper using “tokenomics,” arguing that the word already belongs to crypto and AI should stop trying to steal it. Even the side stories had teeth: one person described a product demo that proudly showed a column for token usage, only for the room to immediately ask the killer question — who’s paying for all this? Suddenly the shiny AI feature looked a lot less magical.

So yes, the paper is about measuring cost. But the comments turned it into a far juicier story about waste, hype, and whether AI coding tools are brilliant assistants or just very expensive overthinkers.

Key Points

  • The study examines token consumption in LLM-based multi-agent software engineering systems to better understand operational efficiency and resource use.
  • Researchers analyzed execution traces from 30 software development tasks run in the ChatDev framework with a GPT-5 reasoning model.
  • ChatDev's internal phases were mapped to six standardized SDLC stages: Design, Coding, Code Completion, Code Review, Testing, and Documentation.
  • Preliminary findings show that Code Review is the largest source of token usage, averaging 59.4% of total consumption.
  • Input tokens represent the largest token category overall at an average of 53.9%, indicating possible inefficiencies in agent collaboration.

Hottest takes

"they really like to write thousands of unit tests but not dynamically test" — sakuraiben
"Maybe soon companies will look at how engineers can optimize the token efficiency of AI" — drivebyhooting
"Tokenomics is already a word used to describe cryptocurrency economics" — satvikpendem
Made with <3 by @siedrix and @shesho from CDMX. Powered by Forge&Hive.