February 19, 2026

Bots on autopilot, humans on edge

Measuring AI agent autonomy in practice

Anthropic says its bot works longer — commenters cry "bad metrics" and "privacy sus"

TLDR: Anthropic says its coding agent now runs autonomously for 45 minutes and asks for clarifications often. Commenters slam the metric as meaningless, question the “privacy-preserving” data use, and warn the real risk is authorization, not power—key concerns as agents creep into sensitive industries

Anthropic dropped a study claiming its coding agent, Claude Code, is flying solo for longer — jumping from under 25 minutes to over 45 — while experienced users hit auto‑approve more and interrupt only when needed. It also says the bot asks for clarification more than humans stop it, and most activity is low‑risk coding, with early pokes into healthcare, finance, and cybersecurity. Cue the comment section chaos. Havoc torched the stopwatch stats, basically yelling, “Minutes don’t mean anything if you don’t account for speed and quality,” joking they can get a Raspberry Pi to slog for six hours while Groq speedruns it in 20 seconds. Privacy alarms blared as prodigycorp insisted Anthropic’s “privacy‑preserving” data collection doesn’t pass the sniff test. The governance crowd, led by saezbaldo, went full red flag: this measures capability, not what the bot is actually allowed to do, warning that authorization always lags behind power. Esafak stirred a mini‑mystery: why the usage dip until Opus launched? Meanwhile, swyx dropped a writeup for the “let’s dig deeper” crew. The vibe: interesting charts, shaky metrics, and a big question — are we measuring the right things before agents wander into high‑stakes jobs

Key Points

  • Claude Code’s autonomous run time in long sessions nearly doubled in three months, from under 25 to over 45 minutes.
  • Experienced users increasingly use full auto-approve (from ~20% to over 40% of sessions) while intervening when necessary.
  • On complex tasks, Claude Code asks for clarification more than twice as often as humans interrupt it.
  • Most agent actions on the public API are low-risk and reversible; ~50% of activity is in software engineering, with emerging use in healthcare, finance, and cybersecurity.
  • The study defines agents as AI systems equipped with tools to take actions and recommends new post-deployment monitoring and human-AI interaction paradigms for managing autonomy and risk.

Hottest takes

"It's a gibberish measurement in itself if you don't control for token speed" — Havoc
"you cant convince me that what they are doing is 'privacy preserving'" — prodigycorp
"This measures what agents can do, not what they should be allowed to do" — saezbaldo
Made with <3 by @siedrix and @shesho from CDMX. Powered by Forge&Hive.