Computer Use Is 45x More Expensive Than Structured APIs

Turns out making an AI click around like a human is wildly pricey, and commenters are not shocked

TLDR: A test found that making an AI use a website like a human can cost about 45 times more than letting it use the app directly. Commenters mostly said "no kidding," but the bigger debate was whether software now needs to be redesigned for an AI-heavy future.

The benchmark dropped one juicy number and the internet instantly did what it does best: dunked, debated, and started planning the future of computing. In the test, an AI using direct app connections finished the job in 8 calls, while the screen-watching version needed a hand-held, 14-step walkthrough, took 14 to 22 minutes, and burned through mountains of tokens. Translation for non-experts: telling an AI to stare at a screen and click buttons like a person is dramatically slower and pricier than letting it talk to the app directly behind the scenes.

The loudest reaction was basically, "Well… obviously." One commenter joked this was the tech equivalent of announcing that the sky is blue and water is wet. Another compared it to shipping: direct connections are the efficient delivery network, while computer-use is the expensive "last mile" option you use only when you must. That became the thread’s big vibe: yes, screen-driving AI is clunky, but it still matters for locked-down software and old internal tools that offer no direct access.

Then came the bigger hot take. Some readers said this proves software itself may need a makeover for the AI era, with every app offering machine-friendly controls while still looking normal for humans. The spiciest leap? A commenter dreaming this could lead to OpenAI making its own phone to challenge Apple and Android. So the real drama wasn’t just the cost gap — it was the community asking whether today’s apps are already becoming yesterday’s design.

Key Points

  • The article benchmarks two AI-agent interaction methods on the same admin panel: vision-based browser control versus direct API calls.
  • Both approaches used the same Claude Sonnet model, the same dataset, and the same task, with only the interface changed.
  • The API agent completed the task in eight calls, while the vision agent initially failed to process all pending reviews because it did not paginate.
  • After the authors rewrote the vision prompt into a fourteen-step UI walkthrough, the vision agent completed the task but took about fourteen minutes and around 500,000 input tokens.
  • The article reports that the vision path was run three times due to long runtimes of 14 to 22 minutes and token consumption of roughly 400,000 to 750,000 tokens, while the API path was run five times.

Hottest takes

"the sky is blue and water is wet" — taormina
"computer use is like last mile delivery" — cjbarber
"the OS needs to be completely rethought" — aurareturn
Made with <3 by @siedrix and @shesho from CDMX. Powered by Forge&Hive.
Computer Use Is 45x More Expensive Than Structured APIs - Weaving News | Weaving News